[llvm-branch-commits] [clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)

2024-06-13 Thread Yaxun Liu via llvm-branch-commits


@@ -117,13 +117,44 @@ void test_update_dpp(global int* out, int arg1, int arg2)
 }
 
 // CHECK-LABEL: @test_ds_fadd
-// CHECK: {{.*}}call{{.*}} float @llvm.amdgcn.ds.fadd.f32(ptr addrspace(3) 
%out, float %src, i32 0, i32 0, i1 false)
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src monotonic, align 
4{{$}}
+// CHECK: atomicrmw volatile fadd ptr addrspace(3) %out, float %src monotonic, 
align 4{{$}}
+
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src acquire, align 
4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src acquire, align 
4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src release, align 
4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src acq_rel, align 
4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src seq_cst, align 
4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src seq_cst, align 
4{{$}}
+
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src syncscope("agent") 
monotonic, align 4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src 
syncscope("workgroup") monotonic, align 4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src 
syncscope("wavefront") monotonic, align 4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src 
syncscope("singlethread") monotonic, align 4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src monotonic, align 
4{{$}}
 #if !defined(__SPIRV__)
 void test_ds_faddf(local float *out, float src) {
 #else
-void test_ds_faddf(__attribute__((address_space(3))) float *out, float src) {
+  void test_ds_faddf(__attribute__((address_space(3))) float *out, float src) {
 #endif
+
   *out = __builtin_amdgcn_ds_faddf(out, src, 0, 0, false);
+  *out = __builtin_amdgcn_ds_faddf(out, src, 0, 0, true);
+
+  // Test all orders.
+  *out = __builtin_amdgcn_ds_faddf(out, src, 1, 0, false);

yxsamliu wrote:

better use predefined macros
```
  // Define macros for the C11 / C++11 memory orderings
  Builder.defineMacro("__ATOMIC_RELAXED", "0");
  Builder.defineMacro("__ATOMIC_CONSUME", "1");
  Builder.defineMacro("__ATOMIC_ACQUIRE", "2");
  Builder.defineMacro("__ATOMIC_RELEASE", "3");
  Builder.defineMacro("__ATOMIC_ACQ_REL", "4");
  Builder.defineMacro("__ATOMIC_SEQ_CST", "5");

  // Define macros for the clang atomic scopes.
  Builder.defineMacro("__MEMORY_SCOPE_SYSTEM", "0");
  Builder.defineMacro("__MEMORY_SCOPE_DEVICE", "1");
  Builder.defineMacro("__MEMORY_SCOPE_WRKGRP", "2");
  Builder.defineMacro("__MEMORY_SCOPE_WVFRNT", "3");
  Builder.defineMacro("__MEMORY_SCOPE_SINGLE", "4");

```

https://github.com/llvm/llvm-project/pull/95395
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add a subtarget feature for fine-grained remote memory support (PR #96442)

2024-06-24 Thread Yaxun Liu via llvm-branch-commits

yxsamliu wrote:

If a sub target does not have this feature, does none of the atomic 
instructions work for fine-grained remote memory, including integer atomic 
add/xchg/cmpxchg?

https://github.com/llvm/llvm-project/pull/96442
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add a subtarget feature for fine-grained remote memory support (PR #96442)

2024-06-24 Thread Yaxun Liu via llvm-branch-commits

yxsamliu wrote:

need some tests

https://github.com/llvm/llvm-project/pull/96442
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add a subtarget feature for fine-grained remote memory support (PR #96442)

2024-06-24 Thread Yaxun Liu via llvm-branch-commits


@@ -788,6 +788,14 @@ def FeatureFlatAtomicFaddF32Inst
   "Has flat_atomic_add_f32 instruction"
 >;
 
+def FeatureAgentScopeFineGrainedRemoteMemoryAtomics
+  : SubtargetFeature<"agent-scope-fine-grained-remote-memory-atomics",
+  "HasAgentScopeFineGrainedRemoteMemoryAtomics",
+  "true",
+  "Agent (device) scoped atomic operations not directly supported by "

yxsamliu wrote:

I feel the description is a little bit confusing, at least for me.

how about

"Agent (device) scoped atomic operations, excluding those directly supported by 
PCIe (i.e., integer atomic add, exchange, and compare-and-swap), are functional 
for allocations in host or peer PCIe device memory."

https://github.com/llvm/llvm-project/pull/96442
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} (PR #96872)

2024-06-27 Thread Yaxun Liu via llvm-branch-commits


@@ -49,7 +49,7 @@ void test_s_wait_event_export_ready() {
 }
 
 // CHECK-LABEL: @test_global_add_f32
-// CHECK: {{.*}}call{{.*}} float 
@llvm.amdgcn.global.atomic.fadd.f32.p1.f32(ptr addrspace(1) %{{.*}}, float 
%{{.*}})
+// CHECK: = atomicrmw fadd ptr addrspace(1) %addr, float %x syncscope("agent") 
seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+}}, 
!amdgpu.ignore.denormal.mode !{{[0-9]+$}}

yxsamliu wrote:

why the memory order is seq_cst ? Does this generate the same ISA as before? 
Can we add some test to emit assembly directly by clang to make sure the ISA 
does not change? 

https://github.com/llvm/llvm-project/pull/96872
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [CUDA][HIP] Fix template static member (PR #98544)

2024-07-11 Thread Yaxun Liu via llvm-branch-commits

https://github.com/yxsamliu edited 
https://github.com/llvm/llvm-project/pull/98544
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [CUDA][HIP] Fix template static member (PR #98544)

2024-07-11 Thread Yaxun Liu via llvm-branch-commits

yxsamliu wrote:

> Looks like this patch includes #98543. You may want to exclude it from the 
> pull request.

done

https://github.com/llvm/llvm-project/pull/98544
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [CUDA][HIP] Fix template static member (PR #98544)

2024-07-11 Thread Yaxun Liu via llvm-branch-commits

https://github.com/yxsamliu closed 
https://github.com/llvm/llvm-project/pull/98544
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [CUDA][HIP] Fix template static member (PR #98544)

2024-07-11 Thread Yaxun Liu via llvm-branch-commits

yxsamliu wrote:

I thought this PR will merge to main branch, but it only merges to my own 
branch. I have to open another PR to merge it to main branch.

https://github.com/llvm/llvm-project/pull/98544
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Remove flat/global atomic fadd v2bf16 intrinsics (PR #97050)

2024-07-19 Thread Yaxun Liu via llvm-branch-commits


@@ -106,106 +100,6 @@ define <2 x half> @flat_atomic_fadd_v2f16_rtn(ptr %ptr, 
<2 x half> %data) {
   ret <2 x half> %ret
 }
 
-define amdgpu_kernel void @flat_atomic_fadd_v2bf16_noret(ptr %ptr, <2 x i16> 
%data) {
-; GFX940-LABEL: flat_atomic_fadd_v2bf16_noret:
-; GFX940:   ; %bb.0:
-; GFX940-NEXT:s_load_dwordx2 s[0:1], s[2:3], 0x24
-; GFX940-NEXT:s_load_dword s4, s[2:3], 0x2c
-; GFX940-NEXT:s_waitcnt lgkmcnt(0)
-; GFX940-NEXT:v_mov_b64_e32 v[0:1], s[0:1]
-; GFX940-NEXT:v_mov_b32_e32 v2, s4
-; GFX940-NEXT:flat_atomic_pk_add_bf16 v[0:1], v2
-; GFX940-NEXT:s_endpgm
-  %ret = call <2 x i16> @llvm.amdgcn.flat.atomic.fadd.v2bf16.p0(ptr %ptr, <2 x 
i16> %data)

yxsamliu wrote:

do we have equivalent codegen tests for the counterpart atomicrmw insts to 
cover the removed tests? same as below

https://github.com/llvm/llvm-project/pull/97050
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] IR/AMDGPU: Autoupgrade amdgpu-unsafe-fp-atomics attribute (PR #101698)

2024-08-07 Thread Yaxun Liu via llvm-branch-commits

https://github.com/yxsamliu approved this pull request.


https://github.com/llvm/llvm-project/pull/101698
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for flat/global atomic min/max f64 builtins (PR #96876)

2024-08-14 Thread Yaxun Liu via llvm-branch-commits

https://github.com/yxsamliu approved this pull request.


https://github.com/llvm/llvm-project/pull/96876
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Remove flat/global atomic fadd v2bf16 intrinsics (PR #97050)

2024-08-20 Thread Yaxun Liu via llvm-branch-commits

https://github.com/yxsamliu approved this pull request.


https://github.com/llvm/llvm-project/pull/97050
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80715 (PR #80716)

2024-02-05 Thread Yaxun Liu via llvm-branch-commits

https://github.com/yxsamliu approved this pull request.


https://github.com/llvm/llvm-project/pull/80716
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] 22078bd - Revert "[CUDA][HIP] ignore implicit host/device attr for override (#72815)"

2023-11-27 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2023-11-22T21:04:55-05:00
New Revision: 22078bd9f6842411aac2b75196975d68a817a358

URL: 
https://github.com/llvm/llvm-project/commit/22078bd9f6842411aac2b75196975d68a817a358
DIFF: 
https://github.com/llvm/llvm-project/commit/22078bd9f6842411aac2b75196975d68a817a358.diff

LOG: Revert "[CUDA][HIP] ignore implicit host/device attr for override (#72815)"

This reverts commit a1e2c6566305061c115954b048f2957c8d55cb5b.

Revert this patch due to regression. A testcase is:

`template 
class C {
explicit C() {};
};

template <> C::C() {};
`

Added: 


Modified: 
clang/lib/Sema/SemaOverload.cpp
clang/test/SemaCUDA/implicit-member-target-inherited.cu
clang/test/SemaCUDA/trivial-ctor-dtor.cu

Removed: 




diff  --git a/clang/lib/Sema/SemaOverload.cpp b/clang/lib/Sema/SemaOverload.cpp
index 64607e28b8b35e6..9800d7f1c9cfee9 100644
--- a/clang/lib/Sema/SemaOverload.cpp
+++ b/clang/lib/Sema/SemaOverload.cpp
@@ -1491,10 +1491,8 @@ static bool IsOverloadOrOverrideImpl(Sema &SemaRef, 
FunctionDecl *New,
 // Don't allow overloading of destructors.  (In theory we could, but it
 // would be a giant change to clang.)
 if (!isa(New)) {
-  Sema::CUDAFunctionTarget NewTarget = SemaRef.IdentifyCUDATarget(
-   New, isa(New)),
-   OldTarget = SemaRef.IdentifyCUDATarget(
-   Old, isa(New));
+  Sema::CUDAFunctionTarget NewTarget = SemaRef.IdentifyCUDATarget(New),
+   OldTarget = SemaRef.IdentifyCUDATarget(Old);
   if (NewTarget != Sema::CFT_InvalidTarget) {
 assert((OldTarget != Sema::CFT_InvalidTarget) &&
"Unexpected invalid target.");

diff  --git a/clang/test/SemaCUDA/implicit-member-target-inherited.cu 
b/clang/test/SemaCUDA/implicit-member-target-inherited.cu
index ceca0891fc9b03c..781199bba6b5a11 100644
--- a/clang/test/SemaCUDA/implicit-member-target-inherited.cu
+++ b/clang/test/SemaCUDA/implicit-member-target-inherited.cu
@@ -39,7 +39,6 @@ struct A2_with_device_ctor {
 };
 // expected-note@-3 {{candidate constructor (the implicit copy constructor) 
not viable}}
 // expected-note@-4 {{candidate constructor (the implicit move constructor) 
not viable}}
-// expected-note@-4 {{candidate inherited constructor not viable: call to 
__device__ function from __host__ function}}
 
 struct B2_with_implicit_default_ctor : A2_with_device_ctor {
   using A2_with_device_ctor::A2_with_device_ctor;

diff  --git a/clang/test/SemaCUDA/trivial-ctor-dtor.cu 
b/clang/test/SemaCUDA/trivial-ctor-dtor.cu
index 21d698d28492ac3..1df8adc62bab590 100644
--- a/clang/test/SemaCUDA/trivial-ctor-dtor.cu
+++ b/clang/test/SemaCUDA/trivial-ctor-dtor.cu
@@ -38,19 +38,3 @@ struct TC : TB {
 };
 
 __device__ TC tc; //expected-error {{dynamic initialization is not 
supported for __device__, __constant__, __shared__, and __managed__ variables}}
-
-// Check trivial ctor specialization
-template 
-struct C { //expected-note {{candidate constructor (the implicit copy 
constructor) not viable}}
-   //expected-note@-1 {{candidate constructor (the implicit move 
constructor) not viable}}
-explicit C() {};
-};
-
-template <> C::C() {};
-__device__ C ci_d;
-C ci_h;
-
-// Check non-trivial ctor specialization
-template <> C::C() { static int nontrivial_ctor = 1; } //expected-note 
{{candidate constructor not viable: call to __host__ function from __device__ 
function}}
-__device__ C cf_d; //expected-error {{no matching constructor for 
initialization of 'C'}}
-C cf_h;



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] 6b3470b - Revert "[CUDA][HIP] make trivial ctor/dtor host device (#72394)"

2023-11-27 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2023-11-22T21:20:53-05:00
New Revision: 6b3470b4b83195aeeda60b101e8d3bf8800c321c

URL: 
https://github.com/llvm/llvm-project/commit/6b3470b4b83195aeeda60b101e8d3bf8800c321c
DIFF: 
https://github.com/llvm/llvm-project/commit/6b3470b4b83195aeeda60b101e8d3bf8800c321c.diff

LOG: Revert "[CUDA][HIP] make trivial ctor/dtor host device (#72394)"

This reverts commit 27e6e4a4d0e3296cebad8db577ec0469a286795e.

This patch is reverted due to regression. A testcase is:

`template 
struct ptr {
~ptr() { static int x = 1;}
};

template 
struct Abc : ptr {
 public:
  Abc();
  ~Abc() {}
};

template
class Abc;
`

Added: 


Modified: 
clang/include/clang/Sema/Sema.h
clang/lib/Sema/SemaCUDA.cpp
clang/lib/Sema/SemaDecl.cpp
clang/test/SemaCUDA/call-host-fn-from-device.cu
clang/test/SemaCUDA/default-ctor.cu
clang/test/SemaCUDA/implicit-member-target-collision-cxx11.cu
clang/test/SemaCUDA/implicit-member-target-collision.cu
clang/test/SemaCUDA/implicit-member-target-inherited.cu
clang/test/SemaCUDA/implicit-member-target.cu

Removed: 
clang/test/SemaCUDA/trivial-ctor-dtor.cu



diff  --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index 59806bcbcbb2dbc..e8914f5fcddf19e 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -13466,10 +13466,6 @@ class Sema final {
   void maybeAddCUDAHostDeviceAttrs(FunctionDecl *FD,
const LookupResult &Previous);
 
-  /// May add implicit CUDAHostAttr and CUDADeviceAttr attributes to a
-  /// trivial cotr/dtor that does not have host and device attributes.
-  void maybeAddCUDAHostDeviceAttrsToTrivialCtorDtor(FunctionDecl *FD);
-
   /// May add implicit CUDAConstantAttr attribute to VD, depending on VD
   /// and current compilation settings.
   void MaybeAddCUDAConstantAttr(VarDecl *VD);

diff  --git a/clang/lib/Sema/SemaCUDA.cpp b/clang/lib/Sema/SemaCUDA.cpp
index b94f448dabe7517..318174f7be8fa95 100644
--- a/clang/lib/Sema/SemaCUDA.cpp
+++ b/clang/lib/Sema/SemaCUDA.cpp
@@ -772,22 +772,6 @@ void Sema::maybeAddCUDAHostDeviceAttrs(FunctionDecl *NewD,
   NewD->addAttr(CUDADeviceAttr::CreateImplicit(Context));
 }
 
-// If a trivial ctor/dtor has no host/device
-// attributes, make it implicitly host device function.
-void Sema::maybeAddCUDAHostDeviceAttrsToTrivialCtorDtor(FunctionDecl *FD) {
-  bool IsTrivialCtor = false;
-  if (auto *CD = dyn_cast(FD))
-IsTrivialCtor = isEmptyCudaConstructor(SourceLocation(), CD);
-  bool IsTrivialDtor = false;
-  if (auto *DD = dyn_cast(FD))
-IsTrivialDtor = isEmptyCudaDestructor(SourceLocation(), DD);
-  if ((IsTrivialCtor || IsTrivialDtor) && !FD->hasAttr() &&
-  !FD->hasAttr()) {
-FD->addAttr(CUDAHostAttr::CreateImplicit(Context));
-FD->addAttr(CUDADeviceAttr::CreateImplicit(Context));
-  }
-}
-
 // TODO: `__constant__` memory may be a limited resource for certain targets.
 // A safeguard may be needed at the end of compilation pipeline if
 // `__constant__` memory usage goes beyond limit.

diff  --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp
index 4e1857b931cc868..23dd8ae15c16583 100644
--- a/clang/lib/Sema/SemaDecl.cpp
+++ b/clang/lib/Sema/SemaDecl.cpp
@@ -16255,9 +16255,6 @@ Decl *Sema::ActOnFinishFunctionBody(Decl *dcl, Stmt 
*Body,
   if (FD && !FD->isDeleted())
 checkTypeSupport(FD->getType(), FD->getLocation(), FD);
 
-  if (LangOpts.CUDA)
-maybeAddCUDAHostDeviceAttrsToTrivialCtorDtor(FD);
-
   return dcl;
 }
 

diff  --git a/clang/test/SemaCUDA/call-host-fn-from-device.cu 
b/clang/test/SemaCUDA/call-host-fn-from-device.cu
index b62de92db02d6de..acdd291b664579b 100644
--- a/clang/test/SemaCUDA/call-host-fn-from-device.cu
+++ b/clang/test/SemaCUDA/call-host-fn-from-device.cu
@@ -12,7 +12,7 @@ extern "C" void host_fn() {}
 struct Dummy {};
 
 struct S {
-  S() { static int nontrivial_ctor = 1; }
+  S() {}
   // expected-note@-1 2 {{'S' declared here}}
   ~S() { host_fn(); }
   // expected-note@-1 {{'~S' declared here}}

diff  --git a/clang/test/SemaCUDA/default-ctor.cu 
b/clang/test/SemaCUDA/default-ctor.cu
index 31971fe6b3863c7..cbad7a1774c1501 100644
--- a/clang/test/SemaCUDA/default-ctor.cu
+++ b/clang/test/SemaCUDA/default-ctor.cu
@@ -25,7 +25,7 @@ __device__ void fd() {
   InD ind;
   InH inh; // expected-error{{no matching constructor for initialization of 
'InH'}}
   InHD inhd;
-  Out out;
+  Out out; // expected-error{{no matching constructor for initialization of 
'Out'}}
   OutD outd;
   OutH outh; // expected-error{{no matching constructor for initialization of 
'OutH'}}
   OutHD outhd;

diff  --git a/clang/test/SemaCUDA/implicit-member-target-collision-cxx11.cu 
b/clang/test/SemaCUDA/implicit-member-target-collision-cxx11.cu
index edb543f637ccc18..06015ed0d6d8edc 100644
--- a/clang/test/SemaCUDA/implicit-member-target-collision-cxx11.cu
+++ b/clang/

[llvm-branch-commits] [clang] 22078bd - Revert "[CUDA][HIP] ignore implicit host/device attr for override (#72815)"

2023-11-27 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2023-11-22T21:04:55-05:00
New Revision: 22078bd9f6842411aac2b75196975d68a817a358

URL: 
https://github.com/llvm/llvm-project/commit/22078bd9f6842411aac2b75196975d68a817a358
DIFF: 
https://github.com/llvm/llvm-project/commit/22078bd9f6842411aac2b75196975d68a817a358.diff

LOG: Revert "[CUDA][HIP] ignore implicit host/device attr for override (#72815)"

This reverts commit a1e2c6566305061c115954b048f2957c8d55cb5b.

Revert this patch due to regression. A testcase is:

`template 
class C {
explicit C() {};
};

template <> C::C() {};
`

Added: 


Modified: 
clang/lib/Sema/SemaOverload.cpp
clang/test/SemaCUDA/implicit-member-target-inherited.cu
clang/test/SemaCUDA/trivial-ctor-dtor.cu

Removed: 




diff  --git a/clang/lib/Sema/SemaOverload.cpp b/clang/lib/Sema/SemaOverload.cpp
index 64607e28b8b35e6..9800d7f1c9cfee9 100644
--- a/clang/lib/Sema/SemaOverload.cpp
+++ b/clang/lib/Sema/SemaOverload.cpp
@@ -1491,10 +1491,8 @@ static bool IsOverloadOrOverrideImpl(Sema &SemaRef, 
FunctionDecl *New,
 // Don't allow overloading of destructors.  (In theory we could, but it
 // would be a giant change to clang.)
 if (!isa(New)) {
-  Sema::CUDAFunctionTarget NewTarget = SemaRef.IdentifyCUDATarget(
-   New, isa(New)),
-   OldTarget = SemaRef.IdentifyCUDATarget(
-   Old, isa(New));
+  Sema::CUDAFunctionTarget NewTarget = SemaRef.IdentifyCUDATarget(New),
+   OldTarget = SemaRef.IdentifyCUDATarget(Old);
   if (NewTarget != Sema::CFT_InvalidTarget) {
 assert((OldTarget != Sema::CFT_InvalidTarget) &&
"Unexpected invalid target.");

diff  --git a/clang/test/SemaCUDA/implicit-member-target-inherited.cu 
b/clang/test/SemaCUDA/implicit-member-target-inherited.cu
index ceca0891fc9b03c..781199bba6b5a11 100644
--- a/clang/test/SemaCUDA/implicit-member-target-inherited.cu
+++ b/clang/test/SemaCUDA/implicit-member-target-inherited.cu
@@ -39,7 +39,6 @@ struct A2_with_device_ctor {
 };
 // expected-note@-3 {{candidate constructor (the implicit copy constructor) 
not viable}}
 // expected-note@-4 {{candidate constructor (the implicit move constructor) 
not viable}}
-// expected-note@-4 {{candidate inherited constructor not viable: call to 
__device__ function from __host__ function}}
 
 struct B2_with_implicit_default_ctor : A2_with_device_ctor {
   using A2_with_device_ctor::A2_with_device_ctor;

diff  --git a/clang/test/SemaCUDA/trivial-ctor-dtor.cu 
b/clang/test/SemaCUDA/trivial-ctor-dtor.cu
index 21d698d28492ac3..1df8adc62bab590 100644
--- a/clang/test/SemaCUDA/trivial-ctor-dtor.cu
+++ b/clang/test/SemaCUDA/trivial-ctor-dtor.cu
@@ -38,19 +38,3 @@ struct TC : TB {
 };
 
 __device__ TC tc; //expected-error {{dynamic initialization is not 
supported for __device__, __constant__, __shared__, and __managed__ variables}}
-
-// Check trivial ctor specialization
-template 
-struct C { //expected-note {{candidate constructor (the implicit copy 
constructor) not viable}}
-   //expected-note@-1 {{candidate constructor (the implicit move 
constructor) not viable}}
-explicit C() {};
-};
-
-template <> C::C() {};
-__device__ C ci_d;
-C ci_h;
-
-// Check non-trivial ctor specialization
-template <> C::C() { static int nontrivial_ctor = 1; } //expected-note 
{{candidate constructor not viable: call to __host__ function from __device__ 
function}}
-__device__ C cf_d; //expected-error {{no matching constructor for 
initialization of 'C'}}
-C cf_h;



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] 6b3470b - Revert "[CUDA][HIP] make trivial ctor/dtor host device (#72394)"

2023-11-27 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2023-11-22T21:20:53-05:00
New Revision: 6b3470b4b83195aeeda60b101e8d3bf8800c321c

URL: 
https://github.com/llvm/llvm-project/commit/6b3470b4b83195aeeda60b101e8d3bf8800c321c
DIFF: 
https://github.com/llvm/llvm-project/commit/6b3470b4b83195aeeda60b101e8d3bf8800c321c.diff

LOG: Revert "[CUDA][HIP] make trivial ctor/dtor host device (#72394)"

This reverts commit 27e6e4a4d0e3296cebad8db577ec0469a286795e.

This patch is reverted due to regression. A testcase is:

`template 
struct ptr {
~ptr() { static int x = 1;}
};

template 
struct Abc : ptr {
 public:
  Abc();
  ~Abc() {}
};

template
class Abc;
`

Added: 


Modified: 
clang/include/clang/Sema/Sema.h
clang/lib/Sema/SemaCUDA.cpp
clang/lib/Sema/SemaDecl.cpp
clang/test/SemaCUDA/call-host-fn-from-device.cu
clang/test/SemaCUDA/default-ctor.cu
clang/test/SemaCUDA/implicit-member-target-collision-cxx11.cu
clang/test/SemaCUDA/implicit-member-target-collision.cu
clang/test/SemaCUDA/implicit-member-target-inherited.cu
clang/test/SemaCUDA/implicit-member-target.cu

Removed: 
clang/test/SemaCUDA/trivial-ctor-dtor.cu



diff  --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index 59806bcbcbb2dbc..e8914f5fcddf19e 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -13466,10 +13466,6 @@ class Sema final {
   void maybeAddCUDAHostDeviceAttrs(FunctionDecl *FD,
const LookupResult &Previous);
 
-  /// May add implicit CUDAHostAttr and CUDADeviceAttr attributes to a
-  /// trivial cotr/dtor that does not have host and device attributes.
-  void maybeAddCUDAHostDeviceAttrsToTrivialCtorDtor(FunctionDecl *FD);
-
   /// May add implicit CUDAConstantAttr attribute to VD, depending on VD
   /// and current compilation settings.
   void MaybeAddCUDAConstantAttr(VarDecl *VD);

diff  --git a/clang/lib/Sema/SemaCUDA.cpp b/clang/lib/Sema/SemaCUDA.cpp
index b94f448dabe7517..318174f7be8fa95 100644
--- a/clang/lib/Sema/SemaCUDA.cpp
+++ b/clang/lib/Sema/SemaCUDA.cpp
@@ -772,22 +772,6 @@ void Sema::maybeAddCUDAHostDeviceAttrs(FunctionDecl *NewD,
   NewD->addAttr(CUDADeviceAttr::CreateImplicit(Context));
 }
 
-// If a trivial ctor/dtor has no host/device
-// attributes, make it implicitly host device function.
-void Sema::maybeAddCUDAHostDeviceAttrsToTrivialCtorDtor(FunctionDecl *FD) {
-  bool IsTrivialCtor = false;
-  if (auto *CD = dyn_cast(FD))
-IsTrivialCtor = isEmptyCudaConstructor(SourceLocation(), CD);
-  bool IsTrivialDtor = false;
-  if (auto *DD = dyn_cast(FD))
-IsTrivialDtor = isEmptyCudaDestructor(SourceLocation(), DD);
-  if ((IsTrivialCtor || IsTrivialDtor) && !FD->hasAttr() &&
-  !FD->hasAttr()) {
-FD->addAttr(CUDAHostAttr::CreateImplicit(Context));
-FD->addAttr(CUDADeviceAttr::CreateImplicit(Context));
-  }
-}
-
 // TODO: `__constant__` memory may be a limited resource for certain targets.
 // A safeguard may be needed at the end of compilation pipeline if
 // `__constant__` memory usage goes beyond limit.

diff  --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp
index 4e1857b931cc868..23dd8ae15c16583 100644
--- a/clang/lib/Sema/SemaDecl.cpp
+++ b/clang/lib/Sema/SemaDecl.cpp
@@ -16255,9 +16255,6 @@ Decl *Sema::ActOnFinishFunctionBody(Decl *dcl, Stmt 
*Body,
   if (FD && !FD->isDeleted())
 checkTypeSupport(FD->getType(), FD->getLocation(), FD);
 
-  if (LangOpts.CUDA)
-maybeAddCUDAHostDeviceAttrsToTrivialCtorDtor(FD);
-
   return dcl;
 }
 

diff  --git a/clang/test/SemaCUDA/call-host-fn-from-device.cu 
b/clang/test/SemaCUDA/call-host-fn-from-device.cu
index b62de92db02d6de..acdd291b664579b 100644
--- a/clang/test/SemaCUDA/call-host-fn-from-device.cu
+++ b/clang/test/SemaCUDA/call-host-fn-from-device.cu
@@ -12,7 +12,7 @@ extern "C" void host_fn() {}
 struct Dummy {};
 
 struct S {
-  S() { static int nontrivial_ctor = 1; }
+  S() {}
   // expected-note@-1 2 {{'S' declared here}}
   ~S() { host_fn(); }
   // expected-note@-1 {{'~S' declared here}}

diff  --git a/clang/test/SemaCUDA/default-ctor.cu 
b/clang/test/SemaCUDA/default-ctor.cu
index 31971fe6b3863c7..cbad7a1774c1501 100644
--- a/clang/test/SemaCUDA/default-ctor.cu
+++ b/clang/test/SemaCUDA/default-ctor.cu
@@ -25,7 +25,7 @@ __device__ void fd() {
   InD ind;
   InH inh; // expected-error{{no matching constructor for initialization of 
'InH'}}
   InHD inhd;
-  Out out;
+  Out out; // expected-error{{no matching constructor for initialization of 
'Out'}}
   OutD outd;
   OutH outh; // expected-error{{no matching constructor for initialization of 
'OutH'}}
   OutHD outhd;

diff  --git a/clang/test/SemaCUDA/implicit-member-target-collision-cxx11.cu 
b/clang/test/SemaCUDA/implicit-member-target-collision-cxx11.cu
index edb543f637ccc18..06015ed0d6d8edc 100644
--- a/clang/test/SemaCUDA/implicit-member-target-collision-cxx11.cu
+++ b/clang/

[llvm-branch-commits] [clang] 22078bd - Revert "[CUDA][HIP] ignore implicit host/device attr for override (#72815)"

2023-11-27 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2023-11-22T21:04:55-05:00
New Revision: 22078bd9f6842411aac2b75196975d68a817a358

URL: 
https://github.com/llvm/llvm-project/commit/22078bd9f6842411aac2b75196975d68a817a358
DIFF: 
https://github.com/llvm/llvm-project/commit/22078bd9f6842411aac2b75196975d68a817a358.diff

LOG: Revert "[CUDA][HIP] ignore implicit host/device attr for override (#72815)"

This reverts commit a1e2c6566305061c115954b048f2957c8d55cb5b.

Revert this patch due to regression. A testcase is:

`template 
class C {
explicit C() {};
};

template <> C::C() {};
`

Added: 


Modified: 
clang/lib/Sema/SemaOverload.cpp
clang/test/SemaCUDA/implicit-member-target-inherited.cu
clang/test/SemaCUDA/trivial-ctor-dtor.cu

Removed: 




diff  --git a/clang/lib/Sema/SemaOverload.cpp b/clang/lib/Sema/SemaOverload.cpp
index 64607e28b8b35e6..9800d7f1c9cfee9 100644
--- a/clang/lib/Sema/SemaOverload.cpp
+++ b/clang/lib/Sema/SemaOverload.cpp
@@ -1491,10 +1491,8 @@ static bool IsOverloadOrOverrideImpl(Sema &SemaRef, 
FunctionDecl *New,
 // Don't allow overloading of destructors.  (In theory we could, but it
 // would be a giant change to clang.)
 if (!isa(New)) {
-  Sema::CUDAFunctionTarget NewTarget = SemaRef.IdentifyCUDATarget(
-   New, isa(New)),
-   OldTarget = SemaRef.IdentifyCUDATarget(
-   Old, isa(New));
+  Sema::CUDAFunctionTarget NewTarget = SemaRef.IdentifyCUDATarget(New),
+   OldTarget = SemaRef.IdentifyCUDATarget(Old);
   if (NewTarget != Sema::CFT_InvalidTarget) {
 assert((OldTarget != Sema::CFT_InvalidTarget) &&
"Unexpected invalid target.");

diff  --git a/clang/test/SemaCUDA/implicit-member-target-inherited.cu 
b/clang/test/SemaCUDA/implicit-member-target-inherited.cu
index ceca0891fc9b03c..781199bba6b5a11 100644
--- a/clang/test/SemaCUDA/implicit-member-target-inherited.cu
+++ b/clang/test/SemaCUDA/implicit-member-target-inherited.cu
@@ -39,7 +39,6 @@ struct A2_with_device_ctor {
 };
 // expected-note@-3 {{candidate constructor (the implicit copy constructor) 
not viable}}
 // expected-note@-4 {{candidate constructor (the implicit move constructor) 
not viable}}
-// expected-note@-4 {{candidate inherited constructor not viable: call to 
__device__ function from __host__ function}}
 
 struct B2_with_implicit_default_ctor : A2_with_device_ctor {
   using A2_with_device_ctor::A2_with_device_ctor;

diff  --git a/clang/test/SemaCUDA/trivial-ctor-dtor.cu 
b/clang/test/SemaCUDA/trivial-ctor-dtor.cu
index 21d698d28492ac3..1df8adc62bab590 100644
--- a/clang/test/SemaCUDA/trivial-ctor-dtor.cu
+++ b/clang/test/SemaCUDA/trivial-ctor-dtor.cu
@@ -38,19 +38,3 @@ struct TC : TB {
 };
 
 __device__ TC tc; //expected-error {{dynamic initialization is not 
supported for __device__, __constant__, __shared__, and __managed__ variables}}
-
-// Check trivial ctor specialization
-template 
-struct C { //expected-note {{candidate constructor (the implicit copy 
constructor) not viable}}
-   //expected-note@-1 {{candidate constructor (the implicit move 
constructor) not viable}}
-explicit C() {};
-};
-
-template <> C::C() {};
-__device__ C ci_d;
-C ci_h;
-
-// Check non-trivial ctor specialization
-template <> C::C() { static int nontrivial_ctor = 1; } //expected-note 
{{candidate constructor not viable: call to __host__ function from __device__ 
function}}
-__device__ C cf_d; //expected-error {{no matching constructor for 
initialization of 'C'}}
-C cf_h;



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] 6b3470b - Revert "[CUDA][HIP] make trivial ctor/dtor host device (#72394)"

2023-11-27 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2023-11-22T21:20:53-05:00
New Revision: 6b3470b4b83195aeeda60b101e8d3bf8800c321c

URL: 
https://github.com/llvm/llvm-project/commit/6b3470b4b83195aeeda60b101e8d3bf8800c321c
DIFF: 
https://github.com/llvm/llvm-project/commit/6b3470b4b83195aeeda60b101e8d3bf8800c321c.diff

LOG: Revert "[CUDA][HIP] make trivial ctor/dtor host device (#72394)"

This reverts commit 27e6e4a4d0e3296cebad8db577ec0469a286795e.

This patch is reverted due to regression. A testcase is:

`template 
struct ptr {
~ptr() { static int x = 1;}
};

template 
struct Abc : ptr {
 public:
  Abc();
  ~Abc() {}
};

template
class Abc;
`

Added: 


Modified: 
clang/include/clang/Sema/Sema.h
clang/lib/Sema/SemaCUDA.cpp
clang/lib/Sema/SemaDecl.cpp
clang/test/SemaCUDA/call-host-fn-from-device.cu
clang/test/SemaCUDA/default-ctor.cu
clang/test/SemaCUDA/implicit-member-target-collision-cxx11.cu
clang/test/SemaCUDA/implicit-member-target-collision.cu
clang/test/SemaCUDA/implicit-member-target-inherited.cu
clang/test/SemaCUDA/implicit-member-target.cu

Removed: 
clang/test/SemaCUDA/trivial-ctor-dtor.cu



diff  --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index 59806bcbcbb2dbc..e8914f5fcddf19e 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -13466,10 +13466,6 @@ class Sema final {
   void maybeAddCUDAHostDeviceAttrs(FunctionDecl *FD,
const LookupResult &Previous);
 
-  /// May add implicit CUDAHostAttr and CUDADeviceAttr attributes to a
-  /// trivial cotr/dtor that does not have host and device attributes.
-  void maybeAddCUDAHostDeviceAttrsToTrivialCtorDtor(FunctionDecl *FD);
-
   /// May add implicit CUDAConstantAttr attribute to VD, depending on VD
   /// and current compilation settings.
   void MaybeAddCUDAConstantAttr(VarDecl *VD);

diff  --git a/clang/lib/Sema/SemaCUDA.cpp b/clang/lib/Sema/SemaCUDA.cpp
index b94f448dabe7517..318174f7be8fa95 100644
--- a/clang/lib/Sema/SemaCUDA.cpp
+++ b/clang/lib/Sema/SemaCUDA.cpp
@@ -772,22 +772,6 @@ void Sema::maybeAddCUDAHostDeviceAttrs(FunctionDecl *NewD,
   NewD->addAttr(CUDADeviceAttr::CreateImplicit(Context));
 }
 
-// If a trivial ctor/dtor has no host/device
-// attributes, make it implicitly host device function.
-void Sema::maybeAddCUDAHostDeviceAttrsToTrivialCtorDtor(FunctionDecl *FD) {
-  bool IsTrivialCtor = false;
-  if (auto *CD = dyn_cast(FD))
-IsTrivialCtor = isEmptyCudaConstructor(SourceLocation(), CD);
-  bool IsTrivialDtor = false;
-  if (auto *DD = dyn_cast(FD))
-IsTrivialDtor = isEmptyCudaDestructor(SourceLocation(), DD);
-  if ((IsTrivialCtor || IsTrivialDtor) && !FD->hasAttr() &&
-  !FD->hasAttr()) {
-FD->addAttr(CUDAHostAttr::CreateImplicit(Context));
-FD->addAttr(CUDADeviceAttr::CreateImplicit(Context));
-  }
-}
-
 // TODO: `__constant__` memory may be a limited resource for certain targets.
 // A safeguard may be needed at the end of compilation pipeline if
 // `__constant__` memory usage goes beyond limit.

diff  --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp
index 4e1857b931cc868..23dd8ae15c16583 100644
--- a/clang/lib/Sema/SemaDecl.cpp
+++ b/clang/lib/Sema/SemaDecl.cpp
@@ -16255,9 +16255,6 @@ Decl *Sema::ActOnFinishFunctionBody(Decl *dcl, Stmt 
*Body,
   if (FD && !FD->isDeleted())
 checkTypeSupport(FD->getType(), FD->getLocation(), FD);
 
-  if (LangOpts.CUDA)
-maybeAddCUDAHostDeviceAttrsToTrivialCtorDtor(FD);
-
   return dcl;
 }
 

diff  --git a/clang/test/SemaCUDA/call-host-fn-from-device.cu 
b/clang/test/SemaCUDA/call-host-fn-from-device.cu
index b62de92db02d6de..acdd291b664579b 100644
--- a/clang/test/SemaCUDA/call-host-fn-from-device.cu
+++ b/clang/test/SemaCUDA/call-host-fn-from-device.cu
@@ -12,7 +12,7 @@ extern "C" void host_fn() {}
 struct Dummy {};
 
 struct S {
-  S() { static int nontrivial_ctor = 1; }
+  S() {}
   // expected-note@-1 2 {{'S' declared here}}
   ~S() { host_fn(); }
   // expected-note@-1 {{'~S' declared here}}

diff  --git a/clang/test/SemaCUDA/default-ctor.cu 
b/clang/test/SemaCUDA/default-ctor.cu
index 31971fe6b3863c7..cbad7a1774c1501 100644
--- a/clang/test/SemaCUDA/default-ctor.cu
+++ b/clang/test/SemaCUDA/default-ctor.cu
@@ -25,7 +25,7 @@ __device__ void fd() {
   InD ind;
   InH inh; // expected-error{{no matching constructor for initialization of 
'InH'}}
   InHD inhd;
-  Out out;
+  Out out; // expected-error{{no matching constructor for initialization of 
'Out'}}
   OutD outd;
   OutH outh; // expected-error{{no matching constructor for initialization of 
'OutH'}}
   OutHD outhd;

diff  --git a/clang/test/SemaCUDA/implicit-member-target-collision-cxx11.cu 
b/clang/test/SemaCUDA/implicit-member-target-collision-cxx11.cu
index edb543f637ccc18..06015ed0d6d8edc 100644
--- a/clang/test/SemaCUDA/implicit-member-target-collision-cxx11.cu
+++ b/clang/

[llvm-branch-commits] [clang] 02d5b11 - [HIPSPV] Fix literals are mapped to Generic address space

2022-02-07 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2022-02-07T10:00:54-05:00
New Revision: 02d5b112138e7e9f30dec685afb380c1b9593a84

URL: 
https://github.com/llvm/llvm-project/commit/02d5b112138e7e9f30dec685afb380c1b9593a84
DIFF: 
https://github.com/llvm/llvm-project/commit/02d5b112138e7e9f30dec685afb380c1b9593a84.diff

LOG: [HIPSPV] Fix literals are mapped to Generic address space

This issue is an oversight in D108621.

Literals in HIP are emitted as global constant variables with default
address space which maps to Generic address space for HIPSPV. In
SPIR-V such variables translate to OpVariable instructions with
Generic storage class which are not legal. Fix by mapping literals
to CrossWorkGroup address space.

The literals are not mapped to UniformConstant because the “flat”
pointers in HIP may reference them and “flat” pointers are modeled
as Generic pointers in SPIR-V. In SPIR-V/OpenCL UniformConstant
pointers may not be casted to Generic.

Patch by: Henry Linjamäki

Reviewed by: Yaxun Liu

Differential Revision: https://reviews.llvm.org/D118876

Added: 


Modified: 
clang/lib/CodeGen/CodeGenModule.cpp
clang/test/CodeGenHIP/hipspv-addr-spaces.cpp

Removed: 




diff  --git a/clang/lib/CodeGen/CodeGenModule.cpp 
b/clang/lib/CodeGen/CodeGenModule.cpp
index 2346176a15628..29806b65e984e 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -4381,6 +4381,14 @@ LangAS CodeGenModule::GetGlobalConstantAddressSpace() 
const {
 return LangAS::opencl_constant;
   if (LangOpts.SYCLIsDevice)
 return LangAS::sycl_global;
+  if (LangOpts.HIP && LangOpts.CUDAIsDevice && getTriple().isSPIRV())
+// For HIPSPV map literals to cuda_device (maps to CrossWorkGroup in 
SPIR-V)
+// instead of default AS (maps to Generic in SPIR-V). Otherwise, we end up
+// with OpVariable instructions with Generic storage class which is not
+// allowed (SPIR-V V1.6 s3.42.8). Also, mapping literals to SPIR-V
+// UniformConstant storage class is not viable as pointers to it may not be
+// casted to Generic pointers which are used to model HIP's "flat" 
pointers.
+return LangAS::cuda_device;
   if (auto AS = getTarget().getConstantAddressSpace())
 return AS.getValue();
   return LangAS::Default;

diff  --git a/clang/test/CodeGenHIP/hipspv-addr-spaces.cpp 
b/clang/test/CodeGenHIP/hipspv-addr-spaces.cpp
index 8f56f2104ecbd..bde360eec8cd9 100644
--- a/clang/test/CodeGenHIP/hipspv-addr-spaces.cpp
+++ b/clang/test/CodeGenHIP/hipspv-addr-spaces.cpp
@@ -22,6 +22,9 @@ __device__ struct foo_t {
   int* pi;
 } foo;
 
+// Check literals are placed in address space 1 (CrossWorkGroup/__global).
+// CHECK: @.str ={{.*}} unnamed_addr addrspace(1) constant
+
 // CHECK: define{{.*}} spir_func noundef i32 addrspace(4)* @_Z3barPi(i32 
addrspace(4)*
 __device__ int* bar(int *x) {
   return x;
@@ -44,3 +47,8 @@ __device__ int* baz_s() {
   // CHECK: ret i32 addrspace(4)* addrspacecast (i32 addrspace(3)* @s to i32 
addrspace(4)*
   return &s;
 }
+
+// CHECK: define{{.*}} spir_func noundef i8 addrspace(4)* @_Z3quzv()
+__device__ const char* quz() {
+  return "abc";
+}



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] 622eaa4 - [HIP] Support __managed__ attribute

2021-01-22 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2021-01-22T11:43:58-05:00
New Revision: 622eaa4a4cea17c2cec6942d9702b010deae392b

URL: 
https://github.com/llvm/llvm-project/commit/622eaa4a4cea17c2cec6942d9702b010deae392b
DIFF: 
https://github.com/llvm/llvm-project/commit/622eaa4a4cea17c2cec6942d9702b010deae392b.diff

LOG: [HIP] Support __managed__ attribute

This patch implements codegen for __managed__ variable attribute for HIP.

Diagnostics will be added later.

Differential Revision: https://reviews.llvm.org/D94814

Added: 
clang/test/AST/Inputs/cuda.h
clang/test/AST/ast-dump-managed-var.cu
clang/test/CodeGenCUDA/managed-var.cu
clang/test/SemaCUDA/managed-var.cu
llvm/include/llvm/IR/ReplaceConstant.h
llvm/lib/IR/ReplaceConstant.cpp

Modified: 
clang/include/clang/Basic/Attr.td
clang/include/clang/Basic/AttrDocs.td
clang/include/clang/Basic/DiagnosticSemaKinds.td
clang/lib/CodeGen/CGCUDANV.cpp
clang/lib/CodeGen/CGCUDARuntime.h
clang/lib/CodeGen/CodeGenModule.cpp
clang/lib/Sema/SemaDeclAttr.cpp
clang/test/CodeGenCUDA/Inputs/cuda.h
clang/test/Misc/pragma-attribute-supported-attributes-list.test
clang/test/SemaCUDA/Inputs/cuda.h
clang/test/SemaCUDA/bad-attributes.cu
clang/test/SemaCUDA/device-var-init.cu
clang/test/SemaCUDA/function-overload.cu
clang/test/SemaCUDA/union-init.cu
llvm/lib/IR/CMakeLists.txt
llvm/lib/Target/XCore/XCoreLowerThreadLocal.cpp

Removed: 




diff  --git a/clang/include/clang/Basic/Attr.td 
b/clang/include/clang/Basic/Attr.td
index b30b91d3d4a6..bfd50f6a6779 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -324,6 +324,7 @@ class LangOpt {
 def MicrosoftExt : LangOpt<"MicrosoftExt">;
 def Borland : LangOpt<"Borland">;
 def CUDA : LangOpt<"CUDA">;
+def HIP : LangOpt<"HIP">;
 def SYCL : LangOpt<"SYCLIsDevice">;
 def COnly : LangOpt<"", "!LangOpts.CPlusPlus">;
 def CPlusPlus : LangOpt<"CPlusPlus">;
@@ -1115,6 +1116,13 @@ def CUDAHost : InheritableAttr {
   let Documentation = [Undocumented];
 }
 
+def HIPManaged : InheritableAttr {
+  let Spellings = [GNU<"managed">, Declspec<"__managed__">];
+  let Subjects = SubjectList<[Var]>;
+  let LangOpts = [HIP];
+  let Documentation = [HIPManagedAttrDocs];
+}
+
 def CUDAInvalidTarget : InheritableAttr {
   let Spellings = [];
   let Subjects = SubjectList<[Function]>;

diff  --git a/clang/include/clang/Basic/AttrDocs.td 
b/clang/include/clang/Basic/AttrDocs.td
index fffede41db1e..170a0fe3d4c4 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -5419,6 +5419,17 @@ unbind runtime APIs.
   }];
 }
 
+def HIPManagedAttrDocs : Documentation {
+  let Category = DocCatDecl;
+  let Content = [{
+The ``__managed__`` attribute can be applied to a global variable declaration 
in HIP.
+A managed variable is emitted as an undefined global symbol in the device 
binary and is
+registered by ``__hipRegisterManagedVariable`` in init functions. The HIP 
runtime allocates
+managed memory and uses it to define the symbol when loading the device binary.
+A managed variable can be accessed in both device and host code.
+  }];
+}
+
 def LifetimeOwnerDocs : Documentation {
   let Category = DocCatDecl;
   let Content = [{

diff  --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td 
b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 758b2ed3e90b..67c59f3ca09a 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -8237,7 +8237,7 @@ def err_cuda_device_exceptions : Error<
   "%select{__device__|__global__|__host__|__host__ __device__}1 function">;
 def err_dynamic_var_init : Error<
 "dynamic initialization is not supported for "
-"__device__, __constant__, and __shared__ variables.">;
+"__device__, __constant__, __shared__, and __managed__ variables.">;
 def err_shared_var_init : Error<
 "initialization is not supported for __shared__ variables.">;
 def err_cuda_vla : Error<
@@ -8247,7 +8247,8 @@ def err_cuda_extern_shared : Error<"__shared__ variable 
%0 cannot be 'extern'">;
 def err_cuda_host_shared : Error<
 "__shared__ local variables not allowed in "
 "%select{__device__|__global__|__host__|__host__ __device__}0 functions">;
-def err_cuda_nonstatic_constdev: Error<"__constant__ and __device__ are not 
allowed on non-static local variables">;
+def err_cuda_nonstatic_constdev: Error<"__constant__, __device__, and "
+"__managed__ are not allowed on non-static local variables">;
 def err_cuda_ovl_target : Error<
   "%select{__device__|__global__|__host__|__host__ __device__}0 function %1 "
   "cannot overload %select{__device__|__global__|__host__|__host__ 
__device__}2 function %3">;

diff  --git a/clang/lib/CodeGen/CGCUDANV.cpp b/clang/lib/CodeGen/CGCUDANV.cpp
index 7c5ab39a85ec..33a2d6f4483e 100644
--- a/clang/lib/CodeGen/CGCUD

[llvm-branch-commits] [clang] 90bf3ec - [clang-offload-bundler] Add option -list

2021-01-06 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2021-01-06T16:23:01-05:00
New Revision: 90bf3ecef4bb1e214a718aebcee730c24199c8ba

URL: 
https://github.com/llvm/llvm-project/commit/90bf3ecef4bb1e214a718aebcee730c24199c8ba
DIFF: 
https://github.com/llvm/llvm-project/commit/90bf3ecef4bb1e214a718aebcee730c24199c8ba.diff

LOG: [clang-offload-bundler] Add option -list

clang-offload-bundler is not only used by clang driver
to bundle/unbundle files for offloading toolchains,
but also used by out of tree tools to unbundle
fat binaries generated by clang. It is important
to be able to list the bundle IDs in a bundled
file so that the bundles can be extracted.

This patch adds an option -list to list bundle
ID's in a bundled file. Each bundle ID is separated
by new line. If the file is not a bundled file
nothing is output and returns 0.

Differential Revision: https://reviews.llvm.org/D92954

Added: 


Modified: 
clang/test/Driver/clang-offload-bundler.c
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp

Removed: 




diff  --git a/clang/test/Driver/clang-offload-bundler.c 
b/clang/test/Driver/clang-offload-bundler.c
index b4bab6bbd1e8..3e1fab25d754 100644
--- a/clang/test/Driver/clang-offload-bundler.c
+++ b/clang/test/Driver/clang-offload-bundler.c
@@ -35,6 +35,7 @@
 // CK-HELP: {{.*}}USAGE: clang-offload-bundler [options]
 // CK-HELP: {{.*}}-allow-missing-bundles {{.*}}- Create empty files if bundles 
are missing when unbundling
 // CK-HELP: {{.*}}-inputs=  - [,...]
+// CK-HELP: {{.*}}-list {{.*}}- List bundle IDs in the bundled file.
 // CK-HELP: {{.*}}-outputs= - [,...]
 // CK-HELP: {{.*}}-targets= - [-,...]
 // CK-HELP: {{.*}}-type=- Type of the files to be 
bundled/unbundled.
@@ -54,7 +55,9 @@
 //
 // RUN: not clang-offload-bundler -type=i 
-targets=host-%itanium_abi_triple,openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu
 -inputs=%t.i,%t.tgt1,%t.tgt2 -outputs=%t.bundle.i -unbundle 2>&1 | FileCheck 
%s --check-prefix CK-ERR1
 // CK-ERR1: error: only one input file supported in unbundling mode
-// CK-ERR1: error: number of output files and targets should match in 
unbundling mode
+
+// RUN: not clang-offload-bundler -type=i 
-targets=host-%itanium_abi_triple,openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu
 -inputs=%t.i -outputs=%t.bundle.i -unbundle 2>&1 | FileCheck %s --check-prefix 
CK-ERR1A
+// CK-ERR1A: error: number of output files and targets should match in 
unbundling mode
 
 // RUN: not clang-offload-bundler -type=i 
-targets=host-%itanium_abi_triple,openmp-powerpc64le-ibm-linux-gnu 
-inputs=%t.i,%t.tgt1,%t.tgt2 -outputs=%t.bundle.i 2>&1 | FileCheck %s 
--check-prefix CK-ERR2
 // RUN: not clang-offload-bundler -type=i 
-targets=host-%itanium_abi_triple,openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu
 -inputs=%t.i,%t.tgt1 -outputs=%t.bundle.i 2>&1 | FileCheck %s --check-prefix 
CK-ERR2
@@ -62,7 +65,6 @@
 
 // RUN: not clang-offload-bundler -type=i 
-targets=host-%itanium_abi_triple,openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu
 -outputs=%t.i,%t.tgt1,%t.tgt2 -inputs=%t.bundle.i 2>&1 | FileCheck %s 
--check-prefix CK-ERR3
 // CK-ERR3: error: only one output file supported in bundling mode
-// CK-ERR3: error: number of input files and targets should match in bundling 
mode
 
 // RUN: not clang-offload-bundler -type=i 
-targets=host-%itanium_abi_triple,openmp-powerpc64le-ibm-linux-gnu 
-outputs=%t.i,%t.tgt1,%t.tgt2 -inputs=%t.bundle.i -unbundle 2>&1 | FileCheck %s 
--check-prefix CK-ERR4
 // RUN: not clang-offload-bundler -type=i 
-targets=host-%itanium_abi_triple,openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu
 -outputs=%t.i,%t.tgt1 -inputs=%t.bundle.i -unbundle 2>&1 | FileCheck %s 
--check-prefix CK-ERR4
@@ -76,19 +78,27 @@
 // CK-ERR6: error: '[[TYPE]]': invalid file type specified
 
 // RUN: not clang-offload-bundler 2>&1 | FileCheck %s --check-prefix CK-ERR7
-// CK-ERR7-DAG: clang-offload-bundler: for the --type option: must be 
specified at least once!
-// CK-ERR7-DAG: clang-offload-bundler: for the --inputs option: must be 
specified at least once!
-// CK-ERR7-DAG: clang-offload-bundler: for the --outputs option: must be 
specified at least once!
-// CK-ERR7-DAG: clang-offload-bundler: for the --targets option: must be 
specified at least once!
+// CK-ERR7: clang-offload-bundler: for the --type option: must be specified at 
least once!
+
+// RUN: not clang-offload-bundler -type=i -inputs=%t.i,%t.tgt1,%t.tgt2 2>&1 | 
FileCheck %s -check-prefix=CK-ERR7A
+// CK-ERR7A: error: for the --outputs option: must be specified at least once!
+
+// RUN: not clang-offload-bundler -type=i -inputs=%t.i,%t.tgt1,%t.tgt2 
-outputs=%t.bundle.i 2>&1 | FileCheck %s -check-prefix=CK-ERR7B
+// CK-ERR7B: error: for the --targets option: must be specified at least once!
 
 // RUN: not clang-offload-bundler -type=i 
-targets=hxst-powerpcxxle-ibm-linux-gnu,openxp-pxxerpc64le-ib

[llvm-branch-commits] [clang] 4f14b80 - [HIP] unbundle bundled preprocessor output

2020-12-15 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2020-12-15T22:14:18-05:00
New Revision: 4f14b80803a458209b6b11daa3ec05076b8c4973

URL: 
https://github.com/llvm/llvm-project/commit/4f14b80803a458209b6b11daa3ec05076b8c4973
DIFF: 
https://github.com/llvm/llvm-project/commit/4f14b80803a458209b6b11daa3ec05076b8c4973.diff

LOG: [HIP] unbundle bundled preprocessor output

There is a use case that users want to emit preprocessor
output as file and compile the preprocessor output later
with -x hip-cpp-output.

Clang emits bundled preprocessor output when users
compile with -E for combined host/device compilations.
Clang should be able to compile the bundled preprocessor
output with -x hip-cpp-output. Basically clang should
unbundle the bundled preprocessor output and launch
device and host compilation actions.

Currently there is a bug in clang driver causing bundled
preprocessor output not unbundled.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D92720

Added: 
clang/test/Driver/hip-unbundle-preproc.hip

Modified: 
clang/lib/Driver/Driver.cpp

Removed: 




diff  --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index dc9ec1b9c362..62fba30f3830 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -2460,8 +2460,9 @@ class OffloadingActionBuilder final {
 
 // If the host input is not CUDA or HIP, we don't need to bother about
 // this input.
-if (IA->getType() != types::TY_CUDA &&
-IA->getType() != types::TY_HIP) {
+if (!(IA->getType() == types::TY_CUDA ||
+  IA->getType() == types::TY_HIP ||
+  IA->getType() == types::TY_PP_HIP)) {
   // The builder will ignore this input.
   IsActive = false;
   return ABRT_Inactive;
@@ -2489,7 +2490,7 @@ class OffloadingActionBuilder final {
 
 // If -fgpu-rdc is disabled, should not unbundle since there is no
 // device code to link.
-if (!Relocatable)
+if (UA->getType() == types::TY_Object && !Relocatable)
   return ABRT_Inactive;
 
 CudaDeviceActions.clear();
@@ -3250,7 +3251,8 @@ class OffloadingActionBuilder final {
 // the input is not a bundle.
 if (CanUseBundler && isa(HostAction) &&
 InputArg->getOption().getKind() == llvm::opt::Option::InputClass &&
-!types::isSrcFile(HostAction->getType())) {
+(!types::isSrcFile(HostAction->getType()) ||
+ HostAction->getType() == types::TY_PP_HIP)) {
   auto UnbundlingHostAction =
   C.MakeAction(HostAction);
   UnbundlingHostAction->registerDependentActionInfo(

diff  --git a/clang/test/Driver/hip-unbundle-preproc.hip 
b/clang/test/Driver/hip-unbundle-preproc.hip
new file mode 100644
index ..1903c72ceb11
--- /dev/null
+++ b/clang/test/Driver/hip-unbundle-preproc.hip
@@ -0,0 +1,25 @@
+// REQUIRES: clang-driver, amdgpu-registered-target
+
+// RUN: %clang -### -target x86_64-unknown-linux-gnu \
+// RUN:   --offload-arch=gfx803 -nogpulib \
+// RUN:   -x hip-cpp-output %s 2>&1 | FileCheck %s
+
+// CHECK: {{".*clang-offload-bundler.*"}} 
{{.*}}"-outputs=[[HOST_PP:.*cui]],[[DEV_PP:.*cui]]" "-unbundle"
+// CHECK: {{".*clang.*"}} "-cc1" {{.*}}"-target-cpu" "gfx803" {{.*}}"-o" 
"[[DEV_O:.*o]]" {{.*}}"[[DEV_PP]]"
+// CHECK: {{".*lld.*"}} {{.*}}"-o" "[[DEV_ISA:.*]]" "[[DEV_O]]"
+// CHECK: {{".*clang-offload-bundler.*"}} {{.*}}"-inputs={{.*}},[[DEV_ISA]]" 
"-outputs=[[FATBIN:.*]]"
+// CHECK: {{".*clang.*"}} {{.*}}"-triple" "x86_64-unknown-linux-gnu"{{.*}} 
"-fcuda-include-gpubinary" "[[FATBIN]]" {{.*}}"-o" "[[HOST_O:.*o]]" 
{{.*}}"[[HOST_PP]]"
+// CHECK: {{".*ld.*"}} {{.*}}"[[HOST_O]]"
+
+// RUN: %clang -### -target x86_64-unknown-linux-gnu \
+// RUN:   --offload-arch=gfx803 -nogpulib -fgpu-rdc \
+// RUN:   -x hip-cpp-output %s 2>&1 | FileCheck -check-prefix=RDC %s
+
+// RDC: {{".*clang-offload-bundler.*"}} 
{{.*}}"-outputs=[[HOST_PP:.*cui]],[[DEV_PP:.*cui]]" "-unbundle"
+// RDC: {{".*clang.*"}} {{.*}}"-triple" "x86_64-unknown-linux-gnu"{{.*}} "-o" 
"[[HOST_O:.*o]]" {{.*}}"[[HOST_PP]]"
+// RDC: {{".*clang-offload-bundler.*"}} 
{{.*}}"-outputs=[[HOST_PP:.*cui]],[[DEV_PP:.*cui]]" "-unbundle"
+// RDC: {{".*clang.*"}} "-cc1" {{.*}}"-target-cpu" "gfx803" {{.*}}"-o" 
"[[DEV_BC:.*bc]]" {{.*}}"[[DEV_PP]]"
+// RDC: {{".*lld.*"}} {{.*}}"-o" "[[DEV_ISA:.*]]" "[[DEV_BC]]"
+// RDC: {{".*clang-offload-bundler.*"}} {{.*}}"-inputs={{.*}},[[DEV_ISA]]" 
"-outputs=[[FATBIN:.*]]"
+// RDC: {{".*llvm-mc.*"}} "-o" "[[FATBIN_O:.*o]]"
+// RDC: {{".*ld.*"}} {{.*}}"[[HOST_O]]" "[[FATBIN_O]]"



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] b9fb063 - [clang-offload-bundler] Add option -allow-missing-bundles

2020-12-16 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2020-12-16T14:52:39-05:00
New Revision: b9fb063e63c7959e8bc9b424bd34b266ca826826

URL: 
https://github.com/llvm/llvm-project/commit/b9fb063e63c7959e8bc9b424bd34b266ca826826
DIFF: 
https://github.com/llvm/llvm-project/commit/b9fb063e63c7959e8bc9b424bd34b266ca826826.diff

LOG: [clang-offload-bundler] Add option -allow-missing-bundles

There are out-of-tree tools using clang-offload-bundler to extract
bundles from bundled files. When a bundle is not in the bundled
file, clang-offload-bundler is expected to emit an error message
and return non-zero value. However currently clang-offload-bundler
silently generates empty file for the missing bundles.

Since OpenMP/HIP toolchains expect the current behavior, an option
-allow-missing-bundles is added to let clang-offload-bundler
create empty file when a bundle is missing when unbundling.
The unbundling job action is updated to use this option by
default.

clang-offload-bundler itself will emit error when a bundle
is missing when unbundling by default.

Changes are also made to check duplicate targets in -targets
option and emit error.

Differential Revision: https://reviews.llvm.org/D93068

Added: 


Modified: 
clang/lib/Driver/ToolChains/Clang.cpp
clang/test/Driver/clang-offload-bundler.c
clang/test/Driver/hip-toolchain-rdc-separate.hip
clang/test/Driver/openmp-offload.c
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp

Removed: 




diff  --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index 1c1224f3990b..6ec6a551fafe 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7392,6 +7392,7 @@ void OffloadBundler::ConstructJobMultipleOutputs(
   }
   CmdArgs.push_back(TCArgs.MakeArgString(UB));
   CmdArgs.push_back("-unbundle");
+  CmdArgs.push_back("-allow-missing-bundles");
 
   // All the inputs are encoded as commands.
   C.addCommand(std::make_unique(

diff  --git a/clang/test/Driver/clang-offload-bundler.c 
b/clang/test/Driver/clang-offload-bundler.c
index 21699e78dda6..b4bab6bbd1e8 100644
--- a/clang/test/Driver/clang-offload-bundler.c
+++ b/clang/test/Driver/clang-offload-bundler.c
@@ -33,6 +33,7 @@
 // CK-HELP: {{.*}}one. The resulting file can also be unbundled into 
diff erent files by
 // CK-HELP: {{.*}}this tool if -unbundle is provided.
 // CK-HELP: {{.*}}USAGE: clang-offload-bundler [options]
+// CK-HELP: {{.*}}-allow-missing-bundles {{.*}}- Create empty files if bundles 
are missing when unbundling
 // CK-HELP: {{.*}}-inputs=  - [,...]
 // CK-HELP: {{.*}}-outputs= - [,...]
 // CK-HELP: {{.*}}-targets= - [-,...]
@@ -88,7 +89,7 @@
 // RUN: not clang-offload-bundler -type=i 
-targets=openmp-powerpc64le-linux,openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu
 -inputs=%t.i,%t.tgt1,%t.tgt2 -outputs=%t.bundle.i 2>&1 | FileCheck %s 
--check-prefix CK-ERR9A
 // RUN: not clang-offload-bundler -type=i 
-targets=host-%itanium_abi_triple,host-%itanium_abi_triple,openmp-x86_64-pc-linux-gnu
 -inputs=%t.i,%t.tgt1,%t.tgt2 -outputs=%t.bundle.i 2>&1 | FileCheck %s 
--check-prefix CK-ERR9B
 // CK-ERR9A: error: expecting exactly one host target but got 0
-// CK-ERR9B: error: expecting exactly one host target but got 2
+// CK-ERR9B: error: Duplicate targets are not allowed
 
 //
 // Check text bundle. This is a readable format, so we check for the format we 
expect to find.
@@ -181,17 +182,17 @@
 // RUN: 
diff  %t.tgt2 %t.res.tgt2
 
 // Check if we can unbundle a file with no magic strings.
-// RUN: clang-offload-bundler -type=s 
-targets=host-%itanium_abi_triple,openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu
 -outputs=%t.res.s,%t.res.tgt1,%t.res.tgt2 -inputs=%t.s -unbundle
+// RUN: clang-offload-bundler -type=s 
-targets=host-%itanium_abi_triple,openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu
 -outputs=%t.res.s,%t.res.tgt1,%t.res.tgt2 -inputs=%t.s -unbundle 
-allow-missing-bundles
 // RUN: 
diff  %t.s %t.res.s
 // RUN: 
diff  %t.empty %t.res.tgt1
 // RUN: 
diff  %t.empty %t.res.tgt2
-// RUN: clang-offload-bundler -type=s 
-targets=openmp-powerpc64le-ibm-linux-gnu,host-%itanium_abi_triple,openmp-x86_64-pc-linux-gnu
 -outputs=%t.res.tgt1,%t.res.s,%t.res.tgt2 -inputs=%t.s -unbundle
+// RUN: clang-offload-bundler -type=s 
-targets=openmp-powerpc64le-ibm-linux-gnu,host-%itanium_abi_triple,openmp-x86_64-pc-linux-gnu
 -outputs=%t.res.tgt1,%t.res.s,%t.res.tgt2 -inputs=%t.s -unbundle 
-allow-missing-bundles
 // RUN: 
diff  %t.s %t.res.s
 // RUN: 
diff  %t.empty %t.res.tgt1
 // RUN: 
diff  %t.empty %t.res.tgt2
 
 // Check that bindler prints an error if given host bundle does not exist in 
the fat binary.
-// RUN: not clang-offload-bundler -type=s 
-targets=host-x86_64-xxx-linux-gnu,openmp-powerpc64le-ibm-linux-gnu 
-outputs=%t.res.s,%t.res.tgt1 -inputs=%t.bundle3.s -unbundle 2>&1 | FileCheck 
%s --check-prefix CK-NO-HO

[llvm-branch-commits] [clang] 011bf4f - Add help text for -nogpuinc

2020-11-30 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2020-11-30T22:31:16-05:00
New Revision: 011bf4f55630858111e5f0504b3f7390eaf41e09

URL: 
https://github.com/llvm/llvm-project/commit/011bf4f55630858111e5f0504b3f7390eaf41e09
DIFF: 
https://github.com/llvm/llvm-project/commit/011bf4f55630858111e5f0504b3f7390eaf41e09.diff

LOG: Add help text for -nogpuinc

Differential Revision: https://reviews.llvm.org/D92339

Added: 


Modified: 
clang/include/clang/Driver/Options.td

Removed: 




diff  --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index ac0761ec773f..cd660aef1662 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -2873,7 +2873,8 @@ def no_pedantic : Flag<["-", "--"], "no-pedantic">, 
Group;
 def no__dead__strip__inits__and__terms : Flag<["-"], 
"no_dead_strip_inits_and_terms">;
 def nobuiltininc : Flag<["-"], "nobuiltininc">, Flags<[CC1Option, CoreOption]>,
   HelpText<"Disable builtin #include directories">;
-def nogpuinc : Flag<["-"], "nogpuinc">;
+def nogpuinc : Flag<["-"], "nogpuinc">, HelpText<"Do not add include paths for 
CUDA/HIP and"
+  " do not include the default CUDA/HIP wrapper headers">;
 def : Flag<["-"], "nocudainc">, Alias;
 def nogpulib : Flag<["-"], "nogpulib">,
   HelpText<"Do not link device library for CUDA/HIP device compilation">;



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] cd95338 - [CUDA][HIP] Fix capturing reference to host variable

2020-12-02 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2020-12-02T10:14:46-05:00
New Revision: cd95338ee3022bffd658e52cd3eb9419b4c218ca

URL: 
https://github.com/llvm/llvm-project/commit/cd95338ee3022bffd658e52cd3eb9419b4c218ca
DIFF: 
https://github.com/llvm/llvm-project/commit/cd95338ee3022bffd658e52cd3eb9419b4c218ca.diff

LOG: [CUDA][HIP] Fix capturing reference to host variable

In C++ when a reference variable is captured by copy, the lambda
is supposed to make a copy of the referenced variable in the captures
and refer to the copy in the lambda. Therefore, it is valid to capture
a reference to a host global variable in a device lambda since the
device lambda will refer to the copy of the host global variable instead
of access the host global variable directly.

However, clang tries to avoid capturing of reference to a host global variable
if it determines the use of the reference variable in the lambda function is
not odr-use. Clang also tries to emit load of the reference to a global variable
as load of the global variable if it determines that the reference variable is
a compile-time constant.

For a device lambda to capture a reference variable to host global variable
and use the captured value, clang needs to be taught that in such cases the use 
of the reference
variable is odr-use and the reference variable is not compile-time constant.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D91088

Added: 
clang/test/CodeGenCUDA/lambda-reference-var.cu

Modified: 
clang/lib/CodeGen/CGExpr.cpp
clang/lib/Sema/SemaExpr.cpp

Removed: 




diff  --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 325801c83de9..92d0cba7a733 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -1522,6 +1522,29 @@ CodeGenFunction::tryEmitAsConstant(DeclRefExpr *refExpr) 
{
   if (result.HasSideEffects)
 return ConstantEmission();
 
+  // In CUDA/HIP device compilation, a lambda may capture a reference variable
+  // referencing a global host variable by copy. In this case the lambda should
+  // make a copy of the value of the global host variable. The DRE of the
+  // captured reference variable cannot be emitted as load from the host
+  // global variable as compile time constant, since the host variable is not
+  // accessible on device. The DRE of the captured reference variable has to be
+  // loaded from captures.
+  if (CGM.getLangOpts().CUDAIsDevice &&
+  refExpr->refersToEnclosingVariableOrCapture()) {
+auto *MD = dyn_cast_or_null(CurCodeDecl);
+if (MD && MD->getParent()->isLambda() &&
+MD->getOverloadedOperator() == OO_Call) {
+  const APValue::LValueBase &base = result.Val.getLValueBase();
+  if (const ValueDecl *D = base.dyn_cast()) {
+if (const VarDecl *VD = dyn_cast(D)) {
+  if (!VD->hasAttr()) {
+return ConstantEmission();
+  }
+}
+  }
+}
+  }
+
   // Emit as a constant.
   auto C = ConstantEmitter(*this).emitAbstract(refExpr->getLocation(),
result.Val, resultType);

diff  --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index 88dab26f2e3b..9c2fc1b9e6dd 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -1934,6 +1934,35 @@ Sema::BuildDeclRefExpr(ValueDecl *D, QualType Ty, 
ExprValueKind VK,
   TemplateArgs);
 }
 
+// CUDA/HIP: Check whether a captured reference variable is referencing a
+// host variable in a device or host device lambda.
+static bool isCapturingReferenceToHostVarInCUDADeviceLambda(const Sema &S,
+VarDecl *VD) {
+  if (!S.getLangOpts().CUDA || !VD->hasInit())
+return false;
+  assert(VD->getType()->isReferenceType());
+
+  // Check whether the reference variable is referencing a host variable.
+  auto *DRE = dyn_cast(VD->getInit());
+  if (!DRE)
+return false;
+  auto *Referee = dyn_cast(DRE->getDecl());
+  if (!Referee || !Referee->hasGlobalStorage() ||
+  Referee->hasAttr())
+return false;
+
+  // Check whether the current function is a device or host device lambda.
+  // Check whether the reference variable is a capture by getDeclContext()
+  // since refersToEnclosingVariableOrCapture() is not ready at this point.
+  auto *MD = dyn_cast_or_null(S.CurContext);
+  if (MD && MD->getParent()->isLambda() &&
+  MD->getOverloadedOperator() == OO_Call && MD->hasAttr() 
&&
+  VD->getDeclContext() != MD)
+return true;
+
+  return false;
+}
+
 NonOdrUseReason Sema::getNonOdrUseReasonInCurrentContext(ValueDecl *D) {
   // A declaration named in an unevaluated operand never constitutes an 
odr-use.
   if (isUnevaluatedContext())
@@ -1943,9 +1972,16 @@ NonOdrUseReason 
Sema::getNonOdrUseReasonInCurrentContext(ValueDecl *D) {
   //   A variable x whose name appears as a potentiall

[llvm-branch-commits] [clang] 5c8911d - [CUDA][HIP] Diagnose reference of host variable

2020-12-02 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2020-12-02T10:15:56-05:00
New Revision: 5c8911d0ba3862119d2507aa55b94766263be13b

URL: 
https://github.com/llvm/llvm-project/commit/5c8911d0ba3862119d2507aa55b94766263be13b
DIFF: 
https://github.com/llvm/llvm-project/commit/5c8911d0ba3862119d2507aa55b94766263be13b.diff

LOG: [CUDA][HIP] Diagnose reference of host variable

This patch diagnoses invalid references of global host variables in device,
global, or host device functions.

Differential Revision: https://reviews.llvm.org/D91281

Added: 
clang/test/SemaCUDA/device-use-host-var.cu

Modified: 
clang/include/clang/Basic/DiagnosticSemaKinds.td
clang/lib/Sema/SemaCUDA.cpp
clang/lib/Sema/SemaExpr.cpp
clang/test/CodeGenCUDA/function-overload.cu
clang/test/CodeGenCXX/builtin-amdgcn-atomic-inc-dec.cpp

Removed: 




diff  --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td 
b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index f2b2b1d3ab6f..3067c077ddb2 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -8145,7 +8145,7 @@ def err_global_call_not_config : Error<
   "call to global function %0 not configured">;
 def err_ref_bad_target : Error<
   "reference to %select{__device__|__global__|__host__|__host__ __device__}0 "
-  "function %1 in %select{__device__|__global__|__host__|__host__ __device__}2 
function">;
+  "%select{function|variable}1 %2 in 
%select{__device__|__global__|__host__|__host__ __device__}3 function">;
 def err_ref_bad_target_global_initializer : Error<
   "reference to %select{__device__|__global__|__host__|__host__ __device__}0 "
   "function %1 in global initializer">;

diff  --git a/clang/lib/Sema/SemaCUDA.cpp b/clang/lib/Sema/SemaCUDA.cpp
index 12a28ab392f8..0f06adf38f7a 100644
--- a/clang/lib/Sema/SemaCUDA.cpp
+++ b/clang/lib/Sema/SemaCUDA.cpp
@@ -743,7 +743,8 @@ bool Sema::CheckCUDACall(SourceLocation Loc, FunctionDecl 
*Callee) {
 return true;
 
   SemaDiagnosticBuilder(DiagKind, Loc, diag::err_ref_bad_target, Caller, *this)
-  << IdentifyCUDATarget(Callee) << Callee << IdentifyCUDATarget(Caller);
+  << IdentifyCUDATarget(Callee) << /*function*/ 0 << Callee
+  << IdentifyCUDATarget(Caller);
   if (!Callee->getBuiltinID())
 SemaDiagnosticBuilder(DiagKind, Callee->getLocation(),
   diag::note_previous_decl, Caller, *this)

diff  --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index 9c2fc1b9e6dd..527605ac4fb8 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -354,6 +354,24 @@ bool Sema::DiagnoseUseOfDecl(NamedDecl *D, 
ArrayRef Locs,
 
   diagnoseUseOfInternalDeclInInlineFunction(*this, D, Loc);
 
+  // CUDA/HIP: Diagnose invalid references of host global variables in device
+  // functions. Reference of device global variables in host functions is
+  // allowed through shadow variables therefore it is not diagnosed.
+  if (LangOpts.CUDAIsDevice) {
+auto *FD = dyn_cast_or_null(CurContext);
+auto Target = IdentifyCUDATarget(FD);
+if (FD && Target != CFT_Host) {
+  const auto *VD = dyn_cast(D);
+  if (VD && VD->hasGlobalStorage() && !VD->hasAttr() &&
+  !VD->hasAttr() && !VD->hasAttr() &&
+  !VD->getType()->isCUDADeviceBuiltinSurfaceType() &&
+  !VD->getType()->isCUDADeviceBuiltinTextureType() &&
+  !VD->isConstexpr() && !VD->getType().isConstQualified())
+targetDiag(*Locs.begin(), diag::err_ref_bad_target)
+<< /*host*/ 2 << /*variable*/ 1 << VD << Target;
+}
+  }
+
   if (LangOpts.SYCLIsDevice || (LangOpts.OpenMP && LangOpts.OpenMPIsDevice)) {
 if (const auto *VD = dyn_cast(D))
   checkDeviceDecl(VD, Loc);

diff  --git a/clang/test/CodeGenCUDA/function-overload.cu 
b/clang/test/CodeGenCUDA/function-overload.cu
index c82b2e96f6c3..9677a5b43b8c 100644
--- a/clang/test/CodeGenCUDA/function-overload.cu
+++ b/clang/test/CodeGenCUDA/function-overload.cu
@@ -12,6 +12,9 @@
 #include "Inputs/cuda.h"
 
 // Check constructors/destructors for D/H functions
+#ifdef __CUDA_ARCH__
+__device__
+#endif
 int x;
 struct s_cd_dh {
   __host__ s_cd_dh() { x = 11; }

diff  --git a/clang/test/CodeGenCXX/builtin-amdgcn-atomic-inc-dec.cpp 
b/clang/test/CodeGenCXX/builtin-amdgcn-atomic-inc-dec.cpp
index 77ea3d485c8a..16600d15f2c4 100644
--- a/clang/test/CodeGenCXX/builtin-amdgcn-atomic-inc-dec.cpp
+++ b/clang/test/CodeGenCXX/builtin-amdgcn-atomic-inc-dec.cpp
@@ -124,7 +124,7 @@ __attribute__((device)) void test_shared64() {
   val = __builtin_amdgcn_atomic_dec64(&val, val, __ATOMIC_SEQ_CST, 
"workgroup");
 }
 
-__UINT32_TYPE__ global_val32;
+__attribute__((device)) __UINT32_TYPE__ global_val32;
 __attribute__((device)) void test_global32() {
   // CHECK-LABEL: test_global32
   // CHECK: %0 = load i32, i32* addrspacecast (i32 addrspace(1)* @global_val32 
to i32*), align 4
@@ -138

[llvm-branch-commits] [clang] acb6f80 - [CUDA][HIP] Fix overloading resolution

2020-12-02 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2020-12-02T16:33:33-05:00
New Revision: acb6f80d96b74af3ec515bb9811d213abb406c31

URL: 
https://github.com/llvm/llvm-project/commit/acb6f80d96b74af3ec515bb9811d213abb406c31
DIFF: 
https://github.com/llvm/llvm-project/commit/acb6f80d96b74af3ec515bb9811d213abb406c31.diff

LOG: [CUDA][HIP] Fix overloading resolution

This patch implements correct hostness based overloading resolution
in isBetterOverloadCandidate.

Based on hostness, if one candidate is emittable whereas the other
candidate is not emittable, the emittable candidate is better.

If both candidates are emittable, or neither is emittable based on hostness, 
then
other rules should be used to determine which is better. This is because
hostness based overloading resolution is mostly for determining
viability of a function. If two functions are both viable, other factors
should take precedence in preference.

If other rules cannot determine which is better, CUDA preference will be
used again to determine which is better.

However, correct hostness based overloading resolution
requires overloading resolution diagnostics to be deferred,
which is not on by default. The rationale is that deferring
overloading resolution diagnostics may hide overloading reslolutions
issues in header files.

An option -fgpu-exclude-wrong-side-overloads is added, which is off by
default.

When -fgpu-exclude-wrong-side-overloads is off, keep the original behavior,
that is, exclude wrong side overloads only if there are same side overloads.
This may result in incorrect overloading resolution when there are no
same side candates, but is sufficient for most CUDA/HIP applications.

When -fgpu-exclude-wrong-side-overloads is on, enable deferring
overloading resolution diagnostics and enable correct hostness
based overloading resolution, i.e., always exclude wrong side overloads.

Differential Revision: https://reviews.llvm.org/D80450

Added: 


Modified: 
clang/include/clang/Basic/LangOptions.def
clang/include/clang/Driver/Options.td
clang/include/clang/Sema/Overload.h
clang/lib/Driver/ToolChains/Clang.cpp
clang/lib/Frontend/CompilerInvocation.cpp
clang/lib/Sema/SemaOverload.cpp
clang/test/Driver/hip-options.hip
clang/test/SemaCUDA/deferred-oeverload.cu
clang/test/SemaCUDA/function-overload.cu

Removed: 




diff  --git a/clang/include/clang/Basic/LangOptions.def 
b/clang/include/clang/Basic/LangOptions.def
index f41febf30c53..071cc314b7d1 100644
--- a/clang/include/clang/Basic/LangOptions.def
+++ b/clang/include/clang/Basic/LangOptions.def
@@ -243,6 +243,7 @@ LANGOPT(GPURelocatableDeviceCode, 1, 0, "generate 
relocatable device code")
 LANGOPT(GPUAllowDeviceInit, 1, 0, "allowing device side global init functions 
for HIP")
 LANGOPT(GPUMaxThreadsPerBlock, 32, 256, "default max threads per block for 
kernel launch bounds for HIP")
 LANGOPT(GPUDeferDiag, 1, 0, "defer host/device related diagnostic messages for 
CUDA/HIP")
+LANGOPT(GPUExcludeWrongSideOverloads, 1, 0, "always exclude wrong side 
overloads in overloading resolution for CUDA/HIP")
 
 LANGOPT(SYCL  , 1, 0, "SYCL")
 LANGOPT(SYCLIsDevice  , 1, 0, "Generate code for SYCL device")

diff  --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 6e37a3154bdf..b58f5cbc63d0 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -721,6 +721,9 @@ defm gpu_allow_device_init : 
OptInFFlag<"gpu-allow-device-init",
 defm gpu_defer_diag : OptInFFlag<"gpu-defer-diag",
   "Defer", "Don't defer", " host/device related diagnostic messages"
   " for CUDA/HIP">;
+defm gpu_exclude_wrong_side_overloads : 
OptInFFlag<"gpu-exclude-wrong-side-overloads",
+  "Always exclude wrong side overloads", "Exclude wrong side overloads only if 
there are same side overloads",
+  " in overloading resolution for CUDA/HIP", [HelpHidden]>;
 def gpu_max_threads_per_block_EQ : Joined<["--"], 
"gpu-max-threads-per-block=">,
   Flags<[CC1Option]>,
   HelpText<"Default max threads per block for kernel launch bounds for HIP">;

diff  --git a/clang/include/clang/Sema/Overload.h 
b/clang/include/clang/Sema/Overload.h
index 4f5e497bc202..5be6a618711c 100644
--- a/clang/include/clang/Sema/Overload.h
+++ b/clang/include/clang/Sema/Overload.h
@@ -1051,6 +1051,9 @@ class Sema;
 
 void destroyCandidates();
 
+/// Whether diagnostics should be deferred.
+bool shouldDeferDiags(Sema &S, ArrayRef Args, SourceLocation 
OpLoc);
+
   public:
 OverloadCandidateSet(SourceLocation Loc, CandidateSetKind CSK,
  OperatorRewriteInfo RewriteInfo = {})

diff  --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index caa77123f7eb..a513c0025a62 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -5610,6 +5610,12 @@ void Clang::

[llvm-branch-commits] [clang] 3a781b9 - Fix assertion in tryEmitAsConstant

2020-12-02 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2020-12-02T19:10:01-05:00
New Revision: 3a781b912fc7b492a21fe52cc8ce6c9e5854a9ab

URL: 
https://github.com/llvm/llvm-project/commit/3a781b912fc7b492a21fe52cc8ce6c9e5854a9ab
DIFF: 
https://github.com/llvm/llvm-project/commit/3a781b912fc7b492a21fe52cc8ce6c9e5854a9ab.diff

LOG: Fix assertion in tryEmitAsConstant

due to cd95338ee3022bffd658e52cd3eb9419b4c218ca

Need to check if result is LValue before getLValueBase.

Added: 


Modified: 
clang/lib/CodeGen/CGExpr.cpp
clang/test/CodeGenCUDA/lambda-reference-var.cu

Removed: 




diff  --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 92d0cba7a733..11914b6cd9fb 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -1529,7 +1529,7 @@ CodeGenFunction::tryEmitAsConstant(DeclRefExpr *refExpr) {
   // global variable as compile time constant, since the host variable is not
   // accessible on device. The DRE of the captured reference variable has to be
   // loaded from captures.
-  if (CGM.getLangOpts().CUDAIsDevice &&
+  if (CGM.getLangOpts().CUDAIsDevice && result.Val.isLValue() &&
   refExpr->refersToEnclosingVariableOrCapture()) {
 auto *MD = dyn_cast_or_null(CurCodeDecl);
 if (MD && MD->getParent()->isLambda() &&

diff  --git a/clang/test/CodeGenCUDA/lambda-reference-var.cu 
b/clang/test/CodeGenCUDA/lambda-reference-var.cu
index 6d7b343b3193..44b012956507 100644
--- a/clang/test/CodeGenCUDA/lambda-reference-var.cu
+++ b/clang/test/CodeGenCUDA/lambda-reference-var.cu
@@ -27,6 +27,15 @@ __device__ void dev_capture_dev_ref_by_copy(int *out) {
   [=](){ *out = ref;}();
 }
 
+// DEV-LABEL: @_ZZ28dev_capture_dev_rval_by_copyPiENKUlvE_clEv(
+// DEV: store i32 3
+__device__ void dev_capture_dev_rval_by_copy(int *out) {
+  constexpr int a = 1;
+  constexpr int b = 2;
+  constexpr int c = a + b;
+  [=](){ *out = c;}();
+}
+
 // DEV-LABEL: @_ZZ26dev_capture_dev_ref_by_refPiENKUlvE_clEv(
 // DEV: %[[VAL:.*]] = load i32, i32* addrspacecast (i32 addrspace(1)* 
@global_device_var to i32*)
 // DEV: %[[VAL2:.*]] = add nsw i32 %[[VAL]], 1



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] 0519e1d - [HIP] Fix bug in driver about wavefront size

2020-12-04 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2020-12-04T08:36:52-05:00
New Revision: 0519e1ddb3885d070f054ca30a7487f915f6f795

URL: 
https://github.com/llvm/llvm-project/commit/0519e1ddb3885d070f054ca30a7487f915f6f795
DIFF: 
https://github.com/llvm/llvm-project/commit/0519e1ddb3885d070f054ca30a7487f915f6f795.diff

LOG: [HIP] Fix bug in driver about wavefront size

The static variable causes it only initialized once and take
the same value for different GPU archs, whereas they
may be different for different GPU archs, e.g. when
there are both gfx900 and gfx1010.

Removing static fixes that.

Differential Revision: https://reviews.llvm.org/D92628

Added: 
clang/test/Driver/hip-wavefront-size.hip

Modified: 
clang/lib/Driver/ToolChains/AMDGPU.cpp

Removed: 




diff  --git a/clang/lib/Driver/ToolChains/AMDGPU.cpp 
b/clang/lib/Driver/ToolChains/AMDGPU.cpp
index 5df7236f0223..1220594281ec 100644
--- a/clang/lib/Driver/ToolChains/AMDGPU.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPU.cpp
@@ -499,7 +499,7 @@ llvm::DenormalMode 
AMDGPUToolChain::getDefaultDenormalModeForType(
 bool AMDGPUToolChain::isWave64(const llvm::opt::ArgList &DriverArgs,
llvm::AMDGPU::GPUKind Kind) {
   const unsigned ArchAttr = llvm::AMDGPU::getArchAttrAMDGCN(Kind);
-  static bool HasWave32 = (ArchAttr & llvm::AMDGPU::FEATURE_WAVE32);
+  bool HasWave32 = (ArchAttr & llvm::AMDGPU::FEATURE_WAVE32);
 
   return !HasWave32 || DriverArgs.hasFlag(
 options::OPT_mwavefrontsize64, options::OPT_mno_wavefrontsize64, false);

diff  --git a/clang/test/Driver/hip-wavefront-size.hip 
b/clang/test/Driver/hip-wavefront-size.hip
new file mode 100644
index ..dd7ca16ae2d3
--- /dev/null
+++ b/clang/test/Driver/hip-wavefront-size.hip
@@ -0,0 +1,21 @@
+// REQUIRES: clang-driver,amdgpu-registered-target
+
+// RUN: %clang -### -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx900 \
+// RUN:   --rocm-path=%S/Inputs/rocm --cuda-device-only %s \
+// RUN:   2>&1 | FileCheck %s --check-prefixes=WAVE64
+// WAVE64: "-mlink-builtin-bitcode" "{{.*}}oclc_wavefrontsize64_on.bc"{{.*}} 
"-target-cpu" "gfx900"
+
+// RUN: %clang -### -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx1010 \
+// RUN:   --rocm-path=%S/Inputs/rocm --cuda-device-only %s \
+// RUN:   2>&1 | FileCheck %s --check-prefixes=WAVE32
+// WAVE32: "-mlink-builtin-bitcode" "{{.*}}oclc_wavefrontsize64_off.bc"{{.*}} 
"-target-cpu" "gfx1010"
+
+// RUN: %clang -### -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx1010 \
+// RUN:   --cuda-gpu-arch=gfx900 \
+// RUN:   --rocm-path=%S/Inputs/rocm --cuda-device-only %s \
+// RUN:   2>&1 | FileCheck %s --check-prefixes=BOTH
+// BOTH-DAG: "-mlink-builtin-bitcode" "{{.*}}oclc_wavefrontsize64_on.bc"{{.*}} 
"-target-cpu" "gfx900"
+// BOTH-DAG: "-mlink-builtin-bitcode" 
"{{.*}}oclc_wavefrontsize64_off.bc"{{.*}} "-target-cpu" "gfx1010"



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 40ad476 - [clang][AMDGPU] rename sram-ecc as sramecc

2020-12-07 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2020-12-07T18:05:47-05:00
New Revision: 40ad476a32445ec98666adcf24d2b33fd887ccc6

URL: 
https://github.com/llvm/llvm-project/commit/40ad476a32445ec98666adcf24d2b33fd887ccc6
DIFF: 
https://github.com/llvm/llvm-project/commit/40ad476a32445ec98666adcf24d2b33fd887ccc6.diff

LOG: [clang][AMDGPU] rename sram-ecc as sramecc

As backend renamed sram-ecc to sramecc, this patch makes
corresponding change in clang.

Differential Revision: https://reviews.llvm.org/D86217

Added: 


Modified: 
clang/include/clang/Basic/DiagnosticDriverKinds.td
clang/include/clang/Basic/TargetID.h
clang/include/clang/Driver/Options.td
clang/lib/Basic/TargetID.cpp
clang/lib/Basic/Targets/AMDGPU.h
clang/test/Driver/amdgpu-features.c
clang/test/Driver/hip-invalid-target-id.hip
clang/test/Driver/hip-target-id.hip
clang/test/Driver/hip-toolchain-features.hip
clang/test/Driver/invalid-target-id.cl
clang/test/Driver/target-id-macros.cl
clang/test/Driver/target-id-macros.hip
clang/test/Driver/target-id.cl
llvm/include/llvm/Support/TargetParser.h
llvm/lib/Support/TargetParser.cpp

Removed: 




diff  --git a/clang/include/clang/Basic/DiagnosticDriverKinds.td 
b/clang/include/clang/Basic/DiagnosticDriverKinds.td
index d6a2609e60f9..8fd7a805589d 100644
--- a/clang/include/clang/Basic/DiagnosticDriverKinds.td
+++ b/clang/include/clang/Basic/DiagnosticDriverKinds.td
@@ -79,7 +79,7 @@ def err_drv_cuda_host_arch : Error<"unsupported architecture 
'%0' for host compi
 def err_drv_mix_cuda_hip : Error<"Mixed Cuda and HIP compilation is not 
supported.">;
 def err_drv_bad_target_id : Error<"Invalid target ID: %0 (A target ID is a 
processor name "
   "followed by an optional list of predefined features post-fixed by a plus or 
minus sign deliminated "
-  "by colon, e.g. 'gfx908:sram-ecc+:xnack-')">;
+  "by colon, e.g. 'gfx908:sramecc+:xnack-')">;
 def err_drv_bad_offload_arch_combo : Error<"Invalid offload arch combinations: 
%0 and %1 (For a specific "
   "processor, a feature should either exist in all offload archs, or not exist 
in any offload archs)">;
 def err_drv_invalid_thread_model_for_target : Error<

diff  --git a/clang/include/clang/Basic/TargetID.h 
b/clang/include/clang/Basic/TargetID.h
index 95fd61d22eb1..1a9785574d06 100644
--- a/clang/include/clang/Basic/TargetID.h
+++ b/clang/include/clang/Basic/TargetID.h
@@ -19,7 +19,7 @@ namespace clang {
 /// Get all feature strings that can be used in target ID for \p Processor.
 /// Target ID is a processor name with optional feature strings
 /// postfixed by a plus or minus sign delimited by colons, e.g.
-/// gfx908:xnack+:sram-ecc-. Each processor have a limited
+/// gfx908:xnack+:sramecc-. Each processor have a limited
 /// number of predefined features when showing up in a target ID.
 const llvm::SmallVector
 getAllPossibleTargetIDFeatures(const llvm::Triple &T,

diff  --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 6480d6e80293..347349031669 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -666,7 +666,7 @@ def no_cuda_include_ptx_EQ : Joined<["--"], 
"no-cuda-include-ptx=">, Flags<[NoXa
 def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>,
   HelpText<"CUDA offloading device architecture (e.g. sm_35), or HIP 
offloading target ID in the form of a "
"device architecture followed by target ID features delimited by a 
colon. Each target ID feature "
-   "is a pre-defined string followed by a plus or minus sign (e.g. 
gfx908:xnack+:sram-ecc-).  May be "
+   "is a pre-defined string followed by a plus or minus sign (e.g. 
gfx908:xnack+:sramecc-).  May be "
"specified more than once.">;
 def cuda_gpu_arch_EQ : Joined<["--"], "cuda-gpu-arch=">, 
Flags<[NoXarchOption]>,
   Alias;
@@ -2568,9 +2568,9 @@ def mcumode : Flag<["-"], "mcumode">, 
Group,
   HelpText<"Specify CU (-mcumode) or WGP (-mno-cumode) wavefront execution 
mode (AMDGPU only)">;
 def mno_cumode : Flag<["-"], "mno-cumode">, Group;
 
-def msram_ecc : Flag<["-"], "msram-ecc">, Group,
+def msramecc : Flag<["-"], "msramecc">, Group,
   HelpText<"Specify SRAM ECC mode (AMDGPU only)">;
-def mno_sram_ecc : Flag<["-"], "mno-sram-ecc">, Group;
+def mno_sramecc : Flag<["-"], "mno-sramecc">, Group;
 
 def mwavefrontsize64 : Flag<["-"], "mwavefrontsize64">, Group,
   HelpText<"Specify wavefront size 64 mode (AMDGPU only)">;

diff  --git a/clang/lib/Basic/TargetID.cpp b/clang/lib/Basic/TargetID.cpp
index 3bb895f28832..59d416f0e015 100644
--- a/clang/lib/Basic/TargetID.cpp
+++ b/clang/lib/Basic/TargetID.cpp
@@ -26,8 +26,8 @@ getAllPossibleAMDGPUTargetIDFeatures(const llvm::Triple &T,
 return Ret;
   auto Features = T.isAMDGCN() ? llvm::AMDGPU::getArchAttrAMDGCN(ProcKind)
: llvm::AMD

[llvm-branch-commits] [clang] 4bed1d9 - [HIP] fix bundle entry ID for --

2020-12-07 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2020-12-07T18:08:37-05:00
New Revision: 4bed1d9b32b19f786aed17865e08c966962513cd

URL: 
https://github.com/llvm/llvm-project/commit/4bed1d9b32b19f786aed17865e08c966962513cd
DIFF: 
https://github.com/llvm/llvm-project/commit/4bed1d9b32b19f786aed17865e08c966962513cd.diff

LOG: [HIP] fix bundle entry ID for --

Canonicalize triple used in fat binary. Change from
amdgcn-amd-amdhsa to amdgcn-amd-amdhsa-.

This is part of https://reviews.llvm.org/D60620

Added: 


Modified: 
clang/lib/Driver/ToolChains/HIP.cpp
clang/test/Driver/hip-target-id.hip
clang/test/Driver/hip-toolchain-device-only.hip
clang/test/Driver/hip-toolchain-no-rdc.hip
clang/test/Driver/hip-toolchain-rdc-separate.hip
clang/test/Driver/hip-toolchain-rdc-static-lib.hip
clang/test/Driver/hip-toolchain-rdc.hip

Removed: 




diff  --git a/clang/lib/Driver/ToolChains/HIP.cpp 
b/clang/lib/Driver/ToolChains/HIP.cpp
index fc1103b48a99..d2f8571e41fb 100644
--- a/clang/lib/Driver/ToolChains/HIP.cpp
+++ b/clang/lib/Driver/ToolChains/HIP.cpp
@@ -120,7 +120,7 @@ void AMDGCN::constructHIPFatbinCommand(Compilation &C, 
const JobAction &JA,
 
   for (const auto &II : Inputs) {
 const auto* A = II.getAction();
-BundlerTargetArg = BundlerTargetArg + ",hip-amdgcn-amd-amdhsa-" +
+BundlerTargetArg = BundlerTargetArg + ",hip-amdgcn-amd-amdhsa--" +
StringRef(A->getOffloadingArch()).str();
 BundlerInputArg = BundlerInputArg + "," + II.getFilename();
   }

diff  --git a/clang/test/Driver/hip-target-id.hip 
b/clang/test/Driver/hip-target-id.hip
index 073f01ca812a..4e5aba65ce11 100644
--- a/clang/test/Driver/hip-target-id.hip
+++ b/clang/test/Driver/hip-target-id.hip
@@ -47,7 +47,7 @@
 // CHECK-SAME: "-plugin-opt=-mattr=-sramecc,+xnack"
 
 // CHECK: {{"[^"]*clang-offload-bundler[^"]*"}}
-// CHECK-SAME: 
"-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa-gfx908:sramecc+:xnack+,hip-amdgcn-amd-amdhsa-gfx908:sramecc-:xnack+"
+// CHECK-SAME: 
"-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa--gfx908:sramecc+:xnack+,hip-amdgcn-amd-amdhsa--gfx908:sramecc-:xnack+"
 
 // Check canonicalization and repeating of target ID.
 
@@ -58,7 +58,7 @@
 // RUN:   --offload-arch=fiji \
 // RUN:   --rocm-path=%S/Inputs/rocm \
 // RUN:   %s 2>&1 | FileCheck -check-prefix=FIJI %s
-// FIJI: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa-gfx803"
+// FIJI: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa--gfx803"
 
 // RUN: %clang -### -target x86_64-linux-gnu \
 // RUN:   -x hip \
@@ -69,4 +69,4 @@
 // RUN:   --offload-arch=gfx906 \
 // RUN:   --rocm-path=%S/Inputs/rocm \
 // RUN:   %s 2>&1 | FileCheck -check-prefix=MULTI %s
-// MULTI: 
"-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa-gfx900:xnack+,hip-amdgcn-amd-amdhsa-gfx900:xnack-,hip-amdgcn-amd-amdhsa-gfx906,hip-amdgcn-amd-amdhsa-gfx908:sramecc+,hip-amdgcn-amd-amdhsa-gfx908:sramecc-"
+// MULTI: 
"-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa--gfx900:xnack+,hip-amdgcn-amd-amdhsa--gfx900:xnack-,hip-amdgcn-amd-amdhsa--gfx906,hip-amdgcn-amd-amdhsa--gfx908:sramecc+,hip-amdgcn-amd-amdhsa--gfx908:sramecc-"

diff  --git a/clang/test/Driver/hip-toolchain-device-only.hip 
b/clang/test/Driver/hip-toolchain-device-only.hip
index e05447f426bd..b3fd7ceb235f 100644
--- a/clang/test/Driver/hip-toolchain-device-only.hip
+++ b/clang/test/Driver/hip-toolchain-device-only.hip
@@ -25,5 +25,5 @@
 // CHECK-SAME: "-o" "[[IMG_DEV_A_900:.*out]]" [[OBJ_DEV_A_900]]
 
 // CHECK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o"
-// CHECK-SAME: 
"-targets={{.*}},hip-amdgcn-amd-amdhsa-gfx803,hip-amdgcn-amd-amdhsa-gfx900"
+// CHECK-SAME: 
"-targets={{.*}},hip-amdgcn-amd-amdhsa--gfx803,hip-amdgcn-amd-amdhsa--gfx900"
 // CHECK-SAME: "-inputs={{.*}},[[IMG_DEV_A_803]],[[IMG_DEV_A_900]]" 
"-outputs=[[BUNDLE_A:.*hipfb]]"

diff  --git a/clang/test/Driver/hip-toolchain-no-rdc.hip 
b/clang/test/Driver/hip-toolchain-no-rdc.hip
index 471c3022ecef..8283bd3d078d 100644
--- a/clang/test/Driver/hip-toolchain-no-rdc.hip
+++ b/clang/test/Driver/hip-toolchain-no-rdc.hip
@@ -82,7 +82,7 @@
 
 // CHECK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o"
 // CHECK-SAME: "-bundle-align=4096"
-// CHECK-SAME: 
"-targets={{.*}},hip-amdgcn-amd-amdhsa-gfx803,hip-amdgcn-amd-amdhsa-gfx900"
+// CHECK-SAME: 
"-targets={{.*}},hip-amdgcn-amd-amdhsa--gfx803,hip-amdgcn-amd-amdhsa--gfx900"
 // CHECK-SAME: "-inputs={{.*}},[[IMG_DEV_A_803]],[[IMG_DEV_A_900]]" 
"-outputs=[[BUNDLE_A:.*hipfb]]"
 
 // CHECK: [[CLANG]] "-cc1" "-triple" "x86_64-unknown-linux-gnu"
@@ -145,7 +145,7 @@
 
 // CHECK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o"
 // CHECK-SAME: "-bundle-align=4096"
-// CHECK-SAME: 
"-targets={{.*}},hip-amdgcn-amd-amdhsa-gfx803,hip-amdgcn-amd-amdhsa-gfx900"
+// CHECK-SAME: 
"-targets={{.*}},hip-amdgcn-amd-amdhsa--gfx803,hip-amdgcn-amd-amdhsa--gfx900"
 // CHECK-SAME: "-inputs=

[llvm-branch-commits] [clang] 5cae708 - [clang][AMDGPU] remove mxnack and msramecc options

2020-12-07 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2020-12-07T18:08:37-05:00
New Revision: 5cae70800266119bbf319675a175cba9a7f315b1

URL: 
https://github.com/llvm/llvm-project/commit/5cae70800266119bbf319675a175cba9a7f315b1
DIFF: 
https://github.com/llvm/llvm-project/commit/5cae70800266119bbf319675a175cba9a7f315b1.diff

LOG: [clang][AMDGPU] remove mxnack and msramecc options

Remove mxnack and msramecc options since they
are deprecated by --offload-arch.

This is part of https://reviews.llvm.org/D60620

Added: 


Modified: 
clang/include/clang/Driver/Options.td

Removed: 




diff  --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 347349031669..4f6851774522 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -2568,19 +2568,11 @@ def mcumode : Flag<["-"], "mcumode">, 
Group,
   HelpText<"Specify CU (-mcumode) or WGP (-mno-cumode) wavefront execution 
mode (AMDGPU only)">;
 def mno_cumode : Flag<["-"], "mno-cumode">, Group;
 
-def msramecc : Flag<["-"], "msramecc">, Group,
-  HelpText<"Specify SRAM ECC mode (AMDGPU only)">;
-def mno_sramecc : Flag<["-"], "mno-sramecc">, Group;
-
 def mwavefrontsize64 : Flag<["-"], "mwavefrontsize64">, Group,
   HelpText<"Specify wavefront size 64 mode (AMDGPU only)">;
 def mno_wavefrontsize64 : Flag<["-"], "mno-wavefrontsize64">, Group,
   HelpText<"Specify wavefront size 32 mode (AMDGPU only)">;
 
-def mxnack : Flag<["-"], "mxnack">, Group,
-  HelpText<"Specify XNACK mode (AMDGPU only)">;
-def mno_xnack : Flag<["-"], "mno-xnack">, Group;
-
 def munsafe_fp_atomics : Flag<["-"], "munsafe-fp-atomics">, Group,
   HelpText<"Enable unsafe floating point atomic instructions (AMDGPU only)">,
   Flags<[CC1Option]>;



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] 0b81d9a - [AMDGPU] add -mcode-object-version=n

2020-12-07 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2020-12-07T18:08:37-05:00
New Revision: 0b81d9a992579ef55b0781c9bc678aa1f3133e9e

URL: 
https://github.com/llvm/llvm-project/commit/0b81d9a992579ef55b0781c9bc678aa1f3133e9e
DIFF: 
https://github.com/llvm/llvm-project/commit/0b81d9a992579ef55b0781c9bc678aa1f3133e9e.diff

LOG: [AMDGPU] add -mcode-object-version=n

Add option -mcode-object-version=n to control code object version for
AMDGPU.

Differential Revision: https://reviews.llvm.org/D91310

Added: 
clang/test/Driver/hip-code-object-version.hip

Modified: 
clang/docs/ClangCommandLineReference.rst
clang/include/clang/Driver/Options.td
clang/lib/Driver/ToolChains/AMDGPU.cpp
clang/lib/Driver/ToolChains/Clang.cpp
clang/lib/Driver/ToolChains/CommonArgs.cpp
clang/lib/Driver/ToolChains/CommonArgs.h
clang/lib/Driver/ToolChains/HIP.cpp
clang/test/Driver/amdgpu-features-as.s
clang/test/Driver/amdgpu-features.c
clang/test/Driver/hip-autolink.hip
clang/test/Driver/hip-device-compile.hip
clang/test/Driver/hip-host-cpu-features.hip
clang/test/Driver/hip-rdc-device-only.hip
clang/test/Driver/hip-target-id.hip
clang/test/Driver/hip-toolchain-device-only.hip
clang/test/Driver/hip-toolchain-mllvm.hip
clang/test/Driver/hip-toolchain-no-rdc.hip
clang/test/Driver/hip-toolchain-opt.hip
clang/test/Driver/hip-toolchain-rdc-separate.hip
clang/test/Driver/hip-toolchain-rdc-static-lib.hip
clang/test/Driver/hip-toolchain-rdc.hip

Removed: 




diff  --git a/clang/docs/ClangCommandLineReference.rst 
b/clang/docs/ClangCommandLineReference.rst
index ce510f335bd4..b46008970f57 100644
--- a/clang/docs/ClangCommandLineReference.rst
+++ b/clang/docs/ClangCommandLineReference.rst
@@ -2663,6 +2663,10 @@ Align selected branches (fused, jcc, jmp) within 32-byte 
boundary
 
 Legacy option to specify code object ABI V2 (-mnocode-object-v3) or V3 
(-mcode-object-v3) (AMDGPU only)
 
+.. option:: -mcode-object-version=
+
+Specify code object ABI version. Defaults to 4. (AMDGPU only)
+
 .. option:: -mconsole
 
 .. program:: clang1

diff  --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 4f6851774522..c6159f50b781 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -2560,6 +2560,10 @@ def mexec_model_EQ : Joined<["-"], "mexec-model=">, 
Group,
  HelpText<"Execution model (WebAssembly only)">;
 
+def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, 
Group,
+  HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
+  MetaVarName<"">, Values<"2,3,4">;
+
 def mcode_object_v3_legacy : Flag<["-"], "mcode-object-v3">, Group,
   HelpText<"Legacy option to specify code object ABI V2 (-mnocode-object-v3) 
or V3 (-mcode-object-v3) (AMDGPU only)">;
 def mno_code_object_v3_legacy : Flag<["-"], "mno-code-object-v3">, 
Group;

diff  --git a/clang/lib/Driver/ToolChains/AMDGPU.cpp 
b/clang/lib/Driver/ToolChains/AMDGPU.cpp
index 1220594281ec..565a77e07fd8 100644
--- a/clang/lib/Driver/ToolChains/AMDGPU.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPU.cpp
@@ -399,8 +399,14 @@ void amdgpu::getAMDGPUTargetFeatures(const Driver &D,
 AMDGPUToolChain::AMDGPUToolChain(const Driver &D, const llvm::Triple &Triple,
  const ArgList &Args)
 : Generic_ELF(D, Triple, Args),
-  OptionsDefault({{options::OPT_O, "3"},
-  {options::OPT_cl_std_EQ, "CL1.2"}}) {}
+  OptionsDefault(
+  {{options::OPT_O, "3"}, {options::OPT_cl_std_EQ, "CL1.2"}}) {
+  // Check code object version options. Emit warnings for legacy options
+  // and errors for the last invalid code object version options.
+  // It is done here to avoid repeated warning or error messages for
+  // each tool invocation.
+  (void)getOrCheckAMDGPUCodeObjectVersion(D, Args, /*Diagnose=*/true);
+}
 
 Tool *AMDGPUToolChain::buildLinker() const {
   return new tools::amdgpu::Linker(*this);

diff  --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index a513c0025a62..86d4c5a8658a 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -1064,24 +1064,14 @@ static const char 
*RelocationModelName(llvm::Reloc::Model Model) {
   }
   llvm_unreachable("Unknown Reloc::Model kind");
 }
-
-static void HandleAmdgcnLegacyOptions(const Driver &D,
-  const ArgList &Args,
-  ArgStringList &CmdArgs) {
-  if (auto *CodeObjArg = Args.getLastArg(options::OPT_mcode_object_v3_legacy,
- 
options::OPT_mno_code_object_v3_legacy)) {
-if (CodeObjArg->getOption().getID() == 
options::OPT_mcode_object_v3_legacy) {
-  D.Diag(diag::warn_drv_deprecated_arg) << "-mcode-object-v3" <<
-"-mllvm --amdhsa-c

[llvm-branch-commits] [clang] efc063b - Fix lit test failure due to 0b81d9

2020-12-07 Thread Yaxun Liu via llvm-branch-commits

Author: Yaxun (Sam) Liu
Date: 2020-12-07T19:50:21-05:00
New Revision: efc063b621ea0c4d1e452bcade62f7fc7e1cc937

URL: 
https://github.com/llvm/llvm-project/commit/efc063b621ea0c4d1e452bcade62f7fc7e1cc937
DIFF: 
https://github.com/llvm/llvm-project/commit/efc063b621ea0c4d1e452bcade62f7fc7e1cc937.diff

LOG: Fix lit test failure due to 0b81d9

These lit tests now requires amdgpu-registered-target since they
use clang driver and clang driver passes an LLVM option which
is available only if amdgpu target is registered.

Change-Id: I2df31967409f1627fc6d342d1ab5cc8aa17c9c0c

Added: 


Modified: 
clang/test/CodeGenOpenCL/amdgpu-debug-info-pointer-address-space.cl
clang/test/CodeGenOpenCL/amdgpu-debug-info-variable-expression.cl
clang/test/Driver/amdgpu-macros.cl
clang/test/Preprocessor/predefined-arch-macros.c

Removed: 




diff  --git 
a/clang/test/CodeGenOpenCL/amdgpu-debug-info-pointer-address-space.cl 
b/clang/test/CodeGenOpenCL/amdgpu-debug-info-pointer-address-space.cl
index f09981dfa0f3..ab625f3154b2 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-debug-info-pointer-address-space.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-debug-info-pointer-address-space.cl
@@ -1,3 +1,4 @@
+// REQUIRES: amdgpu-registered-target
 // RUN: %clang -cl-std=CL2.0 -emit-llvm -g -O0 -S -nogpulib -target 
amdgcn-amd-amdhsa -mcpu=fiji -o - %s | FileCheck %s
 // RUN: %clang -cl-std=CL2.0 -emit-llvm -g -O0 -S -nogpulib -target 
amdgcn-amd-amdhsa-opencl -mcpu=fiji -o - %s | FileCheck %s
 

diff  --git a/clang/test/CodeGenOpenCL/amdgpu-debug-info-variable-expression.cl 
b/clang/test/CodeGenOpenCL/amdgpu-debug-info-variable-expression.cl
index 4a4c8cc54eb3..a305875bcc66 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-debug-info-variable-expression.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-debug-info-variable-expression.cl
@@ -1,3 +1,4 @@
+// REQUIRES: amdgpu-registered-target
 // RUN: %clang -cl-std=CL2.0 -emit-llvm -g -O0 -S -nogpulib -target 
amdgcn-amd-amdhsa -mcpu=fiji -o - %s | FileCheck %s
 // RUN: %clang -cl-std=CL2.0 -emit-llvm -g -O0 -S -nogpulib -target 
amdgcn-amd-amdhsa-opencl -mcpu=fiji -o - %s | FileCheck %s
 

diff  --git a/clang/test/Driver/amdgpu-macros.cl 
b/clang/test/Driver/amdgpu-macros.cl
index 57b54acf85ab..e5611446eace 100644
--- a/clang/test/Driver/amdgpu-macros.cl
+++ b/clang/test/Driver/amdgpu-macros.cl
@@ -1,3 +1,4 @@
+// REQUIRES: amdgpu-registered-target
 // Check that appropriate macros are defined for every supported AMDGPU
 // "-target" and "-mcpu" options.
 

diff  --git a/clang/test/Preprocessor/predefined-arch-macros.c 
b/clang/test/Preprocessor/predefined-arch-macros.c
index 052fb3c1bbf3..254ca60af846 100644
--- a/clang/test/Preprocessor/predefined-arch-macros.c
+++ b/clang/test/Preprocessor/predefined-arch-macros.c
@@ -1,3 +1,4 @@
+// REQUIRES: amdgpu-registered-target
 // Begin X86/GCC/Linux tests 
 
 // RUN: %clang -march=i386 -m32 -E -dM %s -o - 2>&1 \



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] clang/HIP: Remove REQUIRES windows from a test (PR #112411)

2024-10-15 Thread Yaxun Liu via llvm-branch-commits

https://github.com/yxsamliu approved this pull request.


https://github.com/llvm/llvm-project/pull/112411
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [cmake] Extend zstd.dll finding logic from MSVC to Clang (#121437) (PR #121755)

2025-01-06 Thread Yaxun Liu via llvm-branch-commits
=?utf-8?q?Micha=C5=82_G=C3=B3rny?= 
Message-ID:
In-Reply-To: 


https://github.com/yxsamliu approved this pull request.


https://github.com/llvm/llvm-project/pull/121755
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [OffloadBundler] Rework the ctor of `OffloadTargetInfo` to support AMDGPU's generic target (PR #122629)

2025-01-14 Thread Yaxun Liu via llvm-branch-commits

yxsamliu wrote:

> Somewhere for the linker wrapper I just checked if the triple was recognized, 
> you could probably just take strings after the `-` until it stops working.

+1

It would be bad user experience to break existing app. It would be low risk to 
have env+cpu to be a valid cpu. So you could assume env exist first, if fails 
to parse remaining as cpu, then recoil to assume no env and parse the remaining 
all as cpu.

https://github.com/llvm/llvm-project/pull/122629
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [OffloadBundler] Rework the ctor of `OffloadTargetInfo` to support AMDGPU's generic target (PR #122629)

2025-01-14 Thread Yaxun Liu via llvm-branch-commits

yxsamliu wrote:

> First of all, I don't think it can fix the issue in a robust way. Second, 
> `generic` is already a valid target/cpu/offload target.
> 
> Unless we do something like, if the last part is `generic`, we keep looking 
> forward until we can construct a valid target. That has no difference than 
> doing pattern matching, which is back to my previous point about AMD special 
> sauce.
> 
> If we assert that the offload bundler is an AMD only thing (which TBH really 
> looks like so), I'm fine with adding a bunch of more special sauce here.

offload-bundler is not an AMD only thing. At least HIPSPRV toolchain uses it, 
which is Intel GPU.

Still, I think it is possible to make it generic with minor assumption. Let's 
say you are now about to parsing the final part of the target ID string which 
may be either "env-cpu" or "cpu" without env. clang has a function 
getCanonicalProcessorName() which can check whether a string is a valid cpu 
name. Just pass the remaining string to it. If true, that means the remaining 
is a cpu, without env string. Otherwise, assuming there is an env string that 
contains no "-" and split it.

https://github.com/llvm/llvm-project/pull/122629
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [OffloadBundler] Rework the ctor of `OffloadTargetInfo` to support AMDGPU's generic target (PR #122629)

2025-01-14 Thread Yaxun Liu via llvm-branch-commits

yxsamliu wrote:

how about assuming the strict triple format first, this will make the generic 
GPU arch work with strict triple.

If the first assumption fails, then fall back to the legacy parsing, that is, 
assuming no '-' in GPU arch and split at the right most '-'. This way, the old 
target ID string with non-strict triple still works.

https://github.com/llvm/llvm-project/pull/122629
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [OffloadBundler] Rework the ctor of `OffloadTargetInfo` to support AMDGPU's generic target (PR #122629)

2025-01-14 Thread Yaxun Liu via llvm-branch-commits

yxsamliu wrote:

> > > Still, I think it is possible to make it generic with minor assumption. 
> > > Let's say you are now about to parsing the final part of the target ID 
> > > string which may be either "env-cpu" or "cpu" without env.
> > 
> > 
> > This is not actually the issue. The issue is when the cpu is a generic 
> > target, such as `gfx10-3-generic`. By the current logic, the target id 
> > after split is `generic`, which is totally a valid one, and leave the rest 
> > with things like `hip-amd-amdhsa-amd-gfx10-3`.
> 
> That is probably due to this line 
> https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/OffloadBundler.cpp#L88
> 
> It assumes there is no '-' in GPU name.
> 
> we could add a loop. If that line fails, we will split at the second '-' from 
> right.

Ok I get your point. since generic is a valid GPU name. it will stop there.

https://github.com/llvm/llvm-project/pull/122629
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [OffloadBundler] Rework the ctor of `OffloadTargetInfo` to support AMDGPU's generic target (PR #122629)

2025-01-14 Thread Yaxun Liu via llvm-branch-commits

yxsamliu wrote:

> > Still, I think it is possible to make it generic with minor assumption. 
> > Let's say you are now about to parsing the final part of the target ID 
> > string which may be either "env-cpu" or "cpu" without env.
> 
> This is not actually the issue. The issue is when the cpu is a generic 
> target, such as `gfx10-3-generic`. By the current logic, the target id after 
> split is `generic`, which is totally a valid one, and leave the rest with 
> things like `hip-amd-amdhsa-amd-gfx10-3`.

That is probably due to this line 
https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/OffloadBundler.cpp#L88

It assumes there is no '-' in GPU name.

we could add a loop. If that line fails, we will split at the second '-' from 
right.

https://github.com/llvm/llvm-project/pull/122629
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [CUDA][HIP] fix virtual dtor host/device attr (PR #130126)

2025-03-19 Thread Yaxun Liu via llvm-branch-commits

yxsamliu wrote:

> @yxsamliu (or anyone else). If you would like to add a note about this fix in 
> the release notes (completely optional). Please reply to this comment with a 
> one or two sentence description of the fix. When you are done, please add the 
> release:note label to this PR.

Fixed an issue about implicit device attributes of virtual destructors which 
causes undefined symbols for CUDA/HIP programs which use std::string as class 
members with C++20 and MSVC.

https://github.com/llvm/llvm-project/pull/130126
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] clang/AMDGPU: Stop looking for oclc_daz_opt_* control libraries (PR #134805)

2025-04-09 Thread Yaxun Liu via llvm-branch-commits

https://github.com/yxsamliu approved this pull request.

LGTM.

Does this mean device library have no code depending on option 
`-cl-denorms-are-zero`?

https://github.com/llvm/llvm-project/pull/134805
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [CUDA][HIP] fix virtual dtor host/device attr (#128926) (PR #130126)

2025-03-08 Thread Yaxun Liu via llvm-branch-commits

https://github.com/yxsamliu created 
https://github.com/llvm/llvm-project/pull/130126

When inferring host device attr of virtual dtor of explicit template class 
instantiation, clang should be conservative. This guarantees dtors that may 
call host functions not to have implicit device attr, therefore will not be 
emitted on device side.

Backports: 0f0665db067f d37a39207bc1

Fixes: #108548

>From 604bf957fa8a8932e5163d8b10ed910d9a944382 Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" 
Date: Fri, 28 Feb 2025 09:58:19 -0500
Subject: [PATCH] [CUDA][HIP] fix virtual dtor host/device attr (#128926)

When inferring host device attr of virtual dtor of explicit
template class instantiation, clang should be conservative.
This guarantees dtors that may call host functions not to
have implicit device attr, therefore will not be emitted
on device side.

Backports: 0f0665db067f d37a39207bc1

Fixes: #108548
---
 clang/docs/HIPSupport.rst   |  20 ++
 clang/include/clang/Sema/Sema.h |   2 +-
 clang/lib/Sema/Sema.cpp |  43 +
 clang/lib/Sema/SemaCUDA.cpp |  23 ++-
 clang/lib/Sema/SemaDecl.cpp |  15 +
 clang/test/SemaCUDA/dtor.cu | 104 
 6 files changed, 204 insertions(+), 3 deletions(-)
 create mode 100644 clang/test/SemaCUDA/dtor.cu

diff --git a/clang/docs/HIPSupport.rst b/clang/docs/HIPSupport.rst
index 481ed39230813..8f473c21e1918 100644
--- a/clang/docs/HIPSupport.rst
+++ b/clang/docs/HIPSupport.rst
@@ -286,6 +286,26 @@ Example Usage
   basePtr->virtualFunction(); // Allowed since obj is constructed in 
device code
}
 
+Host and Device Attributes of Default Destructors
+===
+
+If a default destructor does not have explicit host or device attributes,
+clang infers these attributes based on the destructors of its data members
+and base classes. If any conflicts are detected among these destructors,
+clang diagnoses the issue. Otherwise, clang adds an implicit host or device
+attribute according to whether the data members's and base classes's
+destructors can execute on the host or device side.
+
+For explicit template classes with virtual destructors, which must be emitted,
+the inference adopts a conservative approach. In this case, implicit host or
+device attributes from member and base class destructors are ignored. This
+precaution is necessary because, although a constexpr destructor carries
+implicit host or device attributes, a constexpr function may call a
+non-constexpr function, which is by default a host function.
+
+Users can override the inferred host and device attributes of default
+destructors by adding explicit host and device attributes to them.
+
 C++ Standard Parallelism Offload Support: Compiler And Runtime
 ==
 
diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index a30a7076ea5d4..af648d7f9c63f 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -4336,11 +4336,11 @@ class Sema final : public SemaBase {
   // Whether the callee should be ignored in CUDA/HIP/OpenMP host/device check.
   bool shouldIgnoreInHostDeviceCheck(FunctionDecl *Callee);
 
-private:
   /// Function or variable declarations to be checked for whether the deferred
   /// diagnostics should be emitted.
   llvm::SmallSetVector DeclsToCheckForDeferredDiags;
 
+private:
   /// Map of current shadowing declarations to shadowed declarations. Warn if
   /// it looks like the user is trying to modify the shadowing declaration.
   llvm::DenseMap ShadowingDecls;
diff --git a/clang/lib/Sema/Sema.cpp b/clang/lib/Sema/Sema.cpp
index 9507d7602aa40..e0eac690e6e65 100644
--- a/clang/lib/Sema/Sema.cpp
+++ b/clang/lib/Sema/Sema.cpp
@@ -1789,6 +1789,47 @@ class DeferredDiagnosticsEmitter
   Inherited::visitUsedDecl(Loc, D);
   }
 
+  // Visitor member and parent dtors called by this dtor.
+  void VisitCalledDestructors(CXXDestructorDecl *DD) {
+const CXXRecordDecl *RD = DD->getParent();
+
+// Visit the dtors of all members
+for (const FieldDecl *FD : RD->fields()) {
+  QualType FT = FD->getType();
+  if (const auto *RT = FT->getAs())
+if (const auto *ClassDecl = dyn_cast(RT->getDecl()))
+  if (ClassDecl->hasDefinition())
+if (CXXDestructorDecl *MemberDtor = ClassDecl->getDestructor())
+  asImpl().visitUsedDecl(MemberDtor->getLocation(), MemberDtor);
+}
+
+// Also visit base class dtors
+for (const auto &Base : RD->bases()) {
+  QualType BaseType = Base.getType();
+  if (const auto *RT = BaseType->getAs())
+if (const auto *BaseDecl = dyn_cast(RT->getDecl()))
+  if (BaseDecl->hasDefinition())
+if (CXXDestructorDecl *BaseDtor = BaseDecl->getDestructor())
+  asImpl().visitUsedDecl(BaseDtor->getLocation(), BaseDtor);
+}
+  }
+
+  void VisitDeclStmt(DeclStmt *DS) {
+   

[llvm-branch-commits] [clang] [CUDA][HIP] fix virtual dtor host/device attr (PR #130126)

2025-03-06 Thread Yaxun Liu via llvm-branch-commits

https://github.com/yxsamliu edited 
https://github.com/llvm/llvm-project/pull/130126
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [CUDA][HIP] fix virtual dtor host/device attr (PR #130126)

2025-03-11 Thread Yaxun Liu via llvm-branch-commits

yxsamliu wrote:

@tstellar Is this PR the right procedure for c-p a fix to LLVM release branch? 
Thanks.

https://github.com/llvm/llvm-project/pull/130126
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [CUDA][HIP] fix virtual dtor host/device attr (PR #130126)

2025-03-12 Thread Yaxun Liu via llvm-branch-commits

https://github.com/yxsamliu updated 
https://github.com/llvm/llvm-project/pull/130126

>From 64ecdf75962cb0e849ee2d39eca900329d3cc745 Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" 
Date: Fri, 28 Feb 2025 09:58:19 -0500
Subject: [PATCH] [CUDA][HIP] fix virtual dtor host/device attr (#128926)

When inferring host device attr of virtual dtor of explicit
template class instantiation, clang should be conservative.
This guarantees dtors that may call host functions not to
have implicit device attr, therefore will not be emitted
on device side.

Backports: 0f0665db067f d37a39207bc1

Fixes: #108548
---
 clang/docs/HIPSupport.rst   |  20 ++
 clang/include/clang/Sema/Sema.h |   2 +-
 clang/lib/Sema/Sema.cpp |  43 +
 clang/lib/Sema/SemaCUDA.cpp |  23 ++-
 clang/lib/Sema/SemaDecl.cpp |  15 +
 clang/test/SemaCUDA/dtor.cu | 104 
 6 files changed, 204 insertions(+), 3 deletions(-)
 create mode 100644 clang/test/SemaCUDA/dtor.cu

diff --git a/clang/docs/HIPSupport.rst b/clang/docs/HIPSupport.rst
index 481ed39230813..8f473c21e1918 100644
--- a/clang/docs/HIPSupport.rst
+++ b/clang/docs/HIPSupport.rst
@@ -286,6 +286,26 @@ Example Usage
   basePtr->virtualFunction(); // Allowed since obj is constructed in 
device code
}
 
+Host and Device Attributes of Default Destructors
+===
+
+If a default destructor does not have explicit host or device attributes,
+clang infers these attributes based on the destructors of its data members
+and base classes. If any conflicts are detected among these destructors,
+clang diagnoses the issue. Otherwise, clang adds an implicit host or device
+attribute according to whether the data members's and base classes's
+destructors can execute on the host or device side.
+
+For explicit template classes with virtual destructors, which must be emitted,
+the inference adopts a conservative approach. In this case, implicit host or
+device attributes from member and base class destructors are ignored. This
+precaution is necessary because, although a constexpr destructor carries
+implicit host or device attributes, a constexpr function may call a
+non-constexpr function, which is by default a host function.
+
+Users can override the inferred host and device attributes of default
+destructors by adding explicit host and device attributes to them.
+
 C++ Standard Parallelism Offload Support: Compiler And Runtime
 ==
 
diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index a30a7076ea5d4..af648d7f9c63f 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -4336,11 +4336,11 @@ class Sema final : public SemaBase {
   // Whether the callee should be ignored in CUDA/HIP/OpenMP host/device check.
   bool shouldIgnoreInHostDeviceCheck(FunctionDecl *Callee);
 
-private:
   /// Function or variable declarations to be checked for whether the deferred
   /// diagnostics should be emitted.
   llvm::SmallSetVector DeclsToCheckForDeferredDiags;
 
+private:
   /// Map of current shadowing declarations to shadowed declarations. Warn if
   /// it looks like the user is trying to modify the shadowing declaration.
   llvm::DenseMap ShadowingDecls;
diff --git a/clang/lib/Sema/Sema.cpp b/clang/lib/Sema/Sema.cpp
index 9507d7602aa40..e0eac690e6e65 100644
--- a/clang/lib/Sema/Sema.cpp
+++ b/clang/lib/Sema/Sema.cpp
@@ -1789,6 +1789,47 @@ class DeferredDiagnosticsEmitter
   Inherited::visitUsedDecl(Loc, D);
   }
 
+  // Visitor member and parent dtors called by this dtor.
+  void VisitCalledDestructors(CXXDestructorDecl *DD) {
+const CXXRecordDecl *RD = DD->getParent();
+
+// Visit the dtors of all members
+for (const FieldDecl *FD : RD->fields()) {
+  QualType FT = FD->getType();
+  if (const auto *RT = FT->getAs())
+if (const auto *ClassDecl = dyn_cast(RT->getDecl()))
+  if (ClassDecl->hasDefinition())
+if (CXXDestructorDecl *MemberDtor = ClassDecl->getDestructor())
+  asImpl().visitUsedDecl(MemberDtor->getLocation(), MemberDtor);
+}
+
+// Also visit base class dtors
+for (const auto &Base : RD->bases()) {
+  QualType BaseType = Base.getType();
+  if (const auto *RT = BaseType->getAs())
+if (const auto *BaseDecl = dyn_cast(RT->getDecl()))
+  if (BaseDecl->hasDefinition())
+if (CXXDestructorDecl *BaseDtor = BaseDecl->getDestructor())
+  asImpl().visitUsedDecl(BaseDtor->getLocation(), BaseDtor);
+}
+  }
+
+  void VisitDeclStmt(DeclStmt *DS) {
+// Visit dtors called by variables that need destruction
+for (auto *D : DS->decls())
+  if (auto *VD = dyn_cast(D))
+if (VD->isThisDeclarationADefinition() &&
+VD->needsDestruction(S.Context)) {
+  QualType VT = VD->getType();
+  if (const auto *RT = VT->getAs())

[llvm-branch-commits] [llvm] AMDGPU: Start considering new atomicrmw metadata on integer operations (PR #122138)

2025-03-25 Thread Yaxun Liu via llvm-branch-commits

yxsamliu wrote:

> > Clang adds !amdgpu.no.fine.grained.memory and !amdgpu.no.remote.memory to 
> > any atomic instructions by default. I think this behavior is expected to 
> > keep ISA unchanged compared to the ISA before these metatadat were 
> > introduced. Did I miss anything?
> 
> All of the tests that fail are atomicMin_system and atomicMax_system. I would 
> expect that the explicit system scoped functions would not be using these 
> annotations. With this PR, these tests are switching from CAS expansion to 
> the direct instruction.
> 
> It just happens that the integer min and max are the cases we handled 
> conservatively before, so it's possible the test is just wrong in some way

I investigated a similar issue about `__hip_atomic_fetch_max` for float on 
gfx1100.

https://github.com/ROCm/hip-tests/blob/amd-staging/catch/unit/atomics/__hip_atomic_fetch_max.cc#L46

Basically it sets the original memory with 5.5 and do an atomic float max with 
7.5, so the expected value in the memory should be 7.5 but we got 5.5.

It was triggered by my clang change to add no_remote_memory and 
no_fine_grained_memory to atomicRMW max instruction. Before my change, the 
backend emits global_atomic_cmpswp_b32. After my change, the backend emits 
global_atomic_max_f32.

The test tried shared memory and global memory allocated in different ways: 

https://github.com/ROCm/hip-tests/blob/amd-staging/catch/include/resource_guards.hh#L63

shared memory passes

memory allocated by hipMalloc, malloc/hipHostRegister, hipMallocManaged passes

only memory allocated by hipHostMalloc fails

I think the difference is that hipHostMalloc allocates fine-grained memory, so 
it violates the requirement no_fine_grained_memory imposed on the atomic max 
instruction.

If I add [[clang::atomic(fine_grained_memory)]] to the block that calls 
`__hip_atomic_fetch_max` 
(https://github.com/ROCm/hip-tests/blob/amd-staging/catch/unit/atomics/min_max_common.hh#L85),
 the test passes. I think this verifies that the atomic attribute works.

https://github.com/llvm/llvm-project/pull/122138
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] clang: Remove dest LangAS argument from performAddrSpaceCast (PR #138866)

2025-05-08 Thread Yaxun Liu via llvm-branch-commits

https://github.com/yxsamliu approved this pull request.

The intention was to make the interface more flexible in cases that a target 
may want to do some arithmetic directly based on target address space instead 
of an addrspacecast inst. However, so many years have passed and no target was 
doing that. I think we could remove it.

https://github.com/llvm/llvm-project/pull/138866
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Start considering new atomicrmw metadata on integer operations (PR #122138)

2025-05-16 Thread Yaxun Liu via llvm-branch-commits

https://github.com/yxsamliu approved this pull request.


https://github.com/llvm/llvm-project/pull/122138
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [flang] [llvm] [AMDGPU] More radical feature initialization refactoring (PR #155222)

2025-08-26 Thread Yaxun Liu via llvm-branch-commits


@@ -364,8 +364,320 @@ StringRef AMDGPU::getCanonicalArchName(const Triple &T, 
StringRef Arch) {
   return T.isAMDGCN() ? getArchNameAMDGCN(ProcKind) : 
getArchNameR600(ProcKind);
 }
 
-void AMDGPU::fillAMDGPUFeatureMap(StringRef GPU, const Triple &T,
-  StringMap &Features) {
+static std::pair
+insertWaveSizeFeature(StringRef GPU, const Triple &T,
+  const StringMap &DefaultFeatures,
+  StringMap &Features) {
+  const bool IsNullGPU = GPU.empty();
+  const bool TargetHasWave32 = DefaultFeatures.count("wavefrontsize32");
+  const bool TargetHasWave64 = DefaultFeatures.count("wavefrontsize64");
+  const bool HaveWave32 = Features.count("wavefrontsize32");
+  const bool HaveWave64 = Features.count("wavefrontsize64");
+  if (HaveWave32 && HaveWave64)
+return {AMDGPU::INVALID_FEATURE_COMBINATION,
+"'wavefrontsize32' and 'wavefrontsize64' are mutually exclusive"};
+
+  if (HaveWave32 && !IsNullGPU && TargetHasWave64)
+return {AMDGPU::UNSUPPORTED_TARGET_FEATURE, "wavefrontsize32"};
+
+  if (HaveWave64 && !IsNullGPU && TargetHasWave32)
+return {AMDGPU::UNSUPPORTED_TARGET_FEATURE, "wavefrontsize64"};
+
+  // Don't assume any wavesize with an unknown subtarget.
+  // Default to wave32 if target supports both.
+  if (!IsNullGPU && !HaveWave32 && !HaveWave64 && !TargetHasWave32 &&
+  !TargetHasWave64)
+Features.insert(std::make_pair("wavefrontsize32", true));
+
+  for (const auto &Entry : DefaultFeatures) {
+if (!Features.count(Entry.getKey()))
+  Features[Entry.getKey()] = Entry.getValue();
+  }
+
+  return {NO_ERROR, StringRef()};
+}
+
+static void fillAMDGCNFeatureMap(StringRef GPU, const Triple &T,
+ StringMap &Features) {
+  AMDGPU::GPUKind Kind = parseArchAMDGCN(GPU);
+  switch (Kind) {
+  case GK_GFX1250:
+Features["ci-insts"] = true;
+Features["dot7-insts"] = true;
+Features["dot8-insts"] = true;
+Features["dl-insts"] = true;
+Features["16-bit-insts"] = true;
+Features["dpp"] = true;
+Features["gfx8-insts"] = true;
+Features["gfx9-insts"] = true;
+Features["gfx10-insts"] = true;
+Features["gfx10-3-insts"] = true;
+Features["gfx11-insts"] = true;
+Features["gfx12-insts"] = true;
+Features["gfx1250-insts"] = true;
+Features["bitop3-insts"] = true;
+Features["prng-inst"] = true;
+Features["tanh-insts"] = true;
+Features["tensor-cvt-lut-insts"] = true;
+Features["transpose-load-f4f6-insts"] = true;
+Features["bf16-trans-insts"] = true;
+Features["bf16-cvt-insts"] = true;
+Features["fp8-conversion-insts"] = true;
+Features["fp8e5m3-insts"] = true;
+Features["permlane16-swap"] = true;
+Features["ashr-pk-insts"] = true;
+Features["atomic-buffer-pk-add-bf16-inst"] = true;
+Features["vmem-pref-insts"] = true;
+Features["atomic-fadd-rtn-insts"] = true;
+Features["atomic-buffer-global-pk-add-f16-insts"] = true;
+Features["atomic-flat-pk-add-16-insts"] = true;
+Features["atomic-global-pk-add-bf16-inst"] = true;
+Features["atomic-ds-pk-add-16-insts"] = true;
+Features["setprio-inc-wg-inst"] = true;
+Features["atomic-fmin-fmax-global-f32"] = true;
+Features["atomic-fmin-fmax-global-f64"] = true;
+Features["wavefrontsize32"] = true;
+break;
+  case GK_GFX1201:
+  case GK_GFX1200:
+  case GK_GFX12_GENERIC:
+Features["ci-insts"] = true;
+Features["dot7-insts"] = true;
+Features["dot8-insts"] = true;
+Features["dot9-insts"] = true;
+Features["dot10-insts"] = true;
+Features["dot11-insts"] = true;
+Features["dot12-insts"] = true;
+Features["dl-insts"] = true;
+Features["atomic-ds-pk-add-16-insts"] = true;
+Features["atomic-flat-pk-add-16-insts"] = true;
+Features["atomic-buffer-global-pk-add-f16-insts"] = true;
+Features["atomic-buffer-pk-add-bf16-inst"] = true;
+Features["atomic-global-pk-add-bf16-inst"] = true;
+Features["16-bit-insts"] = true;
+Features["dpp"] = true;
+Features["gfx8-insts"] = true;
+Features["gfx9-insts"] = true;
+Features["gfx10-insts"] = true;
+Features["gfx10-3-insts"] = true;
+Features["gfx11-insts"] = true;
+Features["gfx12-insts"] = true;
+Features["atomic-fadd-rtn-insts"] = true;
+Features["image-insts"] = true;
+Features["fp8-conversion-insts"] = true;
+Features["atomic-fmin-fmax-global-f32"] = true;
+break;
+  case GK_GFX1153:
+  case GK_GFX1152:
+  case GK_GFX1151:
+  case GK_GFX1150:
+  case GK_GFX1103:
+  case GK_GFX1102:
+  case GK_GFX1101:
+  case GK_GFX1100:
+  case GK_GFX11_GENERIC:
+Features["ci-insts"] = true;
+Features["dot5-insts"] = true;
+Features["dot7-insts"] = true;
+Features["dot8-insts"] = true;
+Features["dot9-insts"] = true;
+Features["dot10-insts"] = true;
+Features["dot12-insts"] = true;
+Features["dl-insts"] = true;
+Features["16-bit-inst

[llvm-branch-commits] [clang] [flang] [llvm] [AMDGPU] More radical feature initialization refactoring (PR #155222)

2025-08-26 Thread Yaxun Liu via llvm-branch-commits


@@ -364,8 +364,320 @@ StringRef AMDGPU::getCanonicalArchName(const Triple &T, 
StringRef Arch) {
   return T.isAMDGCN() ? getArchNameAMDGCN(ProcKind) : 
getArchNameR600(ProcKind);
 }
 
-void AMDGPU::fillAMDGPUFeatureMap(StringRef GPU, const Triple &T,
-  StringMap &Features) {
+static std::pair
+insertWaveSizeFeature(StringRef GPU, const Triple &T,
+  const StringMap &DefaultFeatures,
+  StringMap &Features) {
+  const bool IsNullGPU = GPU.empty();
+  const bool TargetHasWave32 = DefaultFeatures.count("wavefrontsize32");
+  const bool TargetHasWave64 = DefaultFeatures.count("wavefrontsize64");
+  const bool HaveWave32 = Features.count("wavefrontsize32");
+  const bool HaveWave64 = Features.count("wavefrontsize64");
+  if (HaveWave32 && HaveWave64)
+return {AMDGPU::INVALID_FEATURE_COMBINATION,
+"'wavefrontsize32' and 'wavefrontsize64' are mutually exclusive"};
+
+  if (HaveWave32 && !IsNullGPU && TargetHasWave64)
+return {AMDGPU::UNSUPPORTED_TARGET_FEATURE, "wavefrontsize32"};
+
+  if (HaveWave64 && !IsNullGPU && TargetHasWave32)
+return {AMDGPU::UNSUPPORTED_TARGET_FEATURE, "wavefrontsize64"};
+
+  // Don't assume any wavesize with an unknown subtarget.
+  // Default to wave32 if target supports both.
+  if (!IsNullGPU && !HaveWave32 && !HaveWave64 && !TargetHasWave32 &&
+  !TargetHasWave64)
+Features.insert(std::make_pair("wavefrontsize32", true));
+
+  for (const auto &Entry : DefaultFeatures) {
+if (!Features.count(Entry.getKey()))
+  Features[Entry.getKey()] = Entry.getValue();
+  }
+
+  return {NO_ERROR, StringRef()};
+}
+

yxsamliu wrote:

can we add a comment that \p Features contains overriding target features and 
this function returns default target features with entries overridden by \p 
Features.

https://github.com/llvm/llvm-project/pull/155222
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [flang] [llvm] [AMDGPU] More radical feature initialization refactoring (PR #155222)

2025-08-26 Thread Yaxun Liu via llvm-branch-commits

https://github.com/yxsamliu approved this pull request.


https://github.com/llvm/llvm-project/pull/155222
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [flang] [llvm] [AMDGPU] More radical feature initialization refactoring (PR #155222)

2025-08-26 Thread Yaxun Liu via llvm-branch-commits

yxsamliu wrote:

> Do you want me to squash it with parent? I do not mind either way, just split 
> so it is easier to review.

Squashing seems to be cleaner.

https://github.com/llvm/llvm-project/pull/155222
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits