Pierre-vh created this revision.
Pierre-vh added reviewers: arsenm, tra, foad.
Herald added subscribers: StephenFan, wenlei, tpr.
Herald added a project: All.
Pierre-vh requested review of this revision.
Herald added subscribers: cfe-commits, wdng.
Herald added a project: clang.

Device libs make use of patterns like this:

  __attribute__((target("gfx11-insts")))
  static unsigned do_intrin_stuff(void)
  {
    return __builtin_amdgcn_s_sendmsg_rtnl(0x0);
  }

For functions that are assumed to be eliminated if the currennt GPU target 
doesn't support them.
At O0 such functions aren't eliminated by common optimizations but often by 
AMDGPURemoveIncompatibleFunctions instead, which sees the "+gfx11-insts" 
attribute on, say, GFX9 and knows it's not valid, so it removes the function.

D142907 <https://reviews.llvm.org/D142907> accidentally made it so such 
attributes were dropped during bitcode linking, making it impossible for 
RemoveIncompatibleFunctions to catch the functions and causing ISel to catch 
fire eventually.

This fixes the issue and adds a new test to ensure we don't accidentally fall 
into this trap again.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D152251

Files:
  clang/lib/CodeGen/CGCall.cpp
  clang/test/CodeGenCUDA/Inputs/ocml-sample-target-attrs.cl
  clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu
  clang/test/CodeGenCUDA/link-builtin-bitcode-gpu-attrs-preserved.cu

Index: clang/test/CodeGenCUDA/link-builtin-bitcode-gpu-attrs-preserved.cu
===================================================================
--- /dev/null
+++ clang/test/CodeGenCUDA/link-builtin-bitcode-gpu-attrs-preserved.cu
@@ -0,0 +1,47 @@
+// Verify the behavior of the +gfxN-insts in the way that
+// rocm-device-libs should be built with. e.g. If the device libraries has a function
+// with "+gfx11-insts", that attribute should still be present after linking and not
+// overwritten with the current target's settings.
+
+// This is important because at this time, many device-libs functions that are only
+// available on some GPUs put an attribute such as "+gfx11-insts" so that
+// AMDGPURemoveIncompatibleFunctions can detect & remove them if needed.
+
+// Build the fake device library in the way rocm-device-libs should be built.
+//
+// RUN: %clang_cc1 -x cl -triple amdgcn-amd-amdhsa\
+// RUN:   -mcode-object-version=none -emit-llvm-bc \
+// RUN:   %S/Inputs/ocml-sample-target-attrs.cl -o %t.bc
+
+// Check the default behavior
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx803 -fcuda-is-device \
+// RUN:   -mlink-builtin-bitcode %t.bc \
+// RUN:   -emit-llvm %s -o - | FileCheck %s
+
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1101 -fcuda-is-device \
+// RUN:   -mlink-builtin-bitcode %t.bc -emit-llvm %s -o - | FileCheck %s
+
+// Check the case where no internalization is performed
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx803 \
+// RUN:   -fcuda-is-device -mlink-bitcode-file %t.bc -emit-llvm %s -o -  | FileCheck %s
+
+// Check the case where no internalization is performed
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1101 \
+// RUN:   -fcuda-is-device -mlink-bitcode-file %t.bc -emit-llvm %s -o - | FileCheck %s
+
+
+// CHECK: define {{.*}} i32 @do_intrin_stuff() #[[ATTR:[0-9]+]]
+// CHECK: attributes #[[ATTR]] = {{.*}} "target-features"="+gfx11-insts"
+
+#define __device__ __attribute__((device))
+#define __global__ __attribute__((global))
+
+typedef _Float16 half;
+
+extern "C" {
+__device__ unsigned do_intrin_stuff(void);
+
+__global__ void kernel_f16(unsigned* out) {
+    *out = do_intrin_stuff();
+  }
+}
Index: clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu
===================================================================
--- clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu
+++ clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu
@@ -132,26 +132,26 @@
 
 // Default mode relies on the implicit check-not for the denormal-fp-math.
 
-// PSZ: #[[$KERNELATTR]] = { {{.*}} "denormal-fp-math"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
-// PSZ: #[[$FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
-// PSZ: #[[$WEAK_FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
+// PSZ: #[[$KERNELATTR]] = { {{.*}} "denormal-fp-math"="preserve-sign,preserve-sign" {{.*}} }
+// PSZ: #[[$FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} }
+// PSZ: #[[$WEAK_FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} }
 
 // FIXME: Should check-not "denormal-fp-math" within the line
-// IEEEF64-PSZF32: #[[$KERNELATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
-// IEEEF64-PSZF32: #[[$FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
-// IEEEF64-PSZF32: #[[$WEAK_FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
+// IEEEF64-PSZF32: #[[$KERNELATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} }
+// IEEEF64-PSZF32: #[[$FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} }
+// IEEEF64-PSZF32: #[[$WEAK_FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} }
 
-// IEEEF32-PSZF64-DYNF32: #[[$KERNELATTR]] = { {{.*}} "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-f32"="ieee,ieee" {{.*}} "target-cpu"="gfx803" {{.*}}  }
+// IEEEF32-PSZF64-DYNF32: #[[$KERNELATTR]] = { {{.*}} "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-f32"="ieee,ieee" {{.*}}  }
 // implicit check-not
 // implicit check-not
 
 
-// IEEEF32-PSZF64-DYNFULL: #[[$KERNELATTR]] = { {{.*}} "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-f32"="ieee,ieee" {{.*}} "target-cpu"="gfx803" {{.*}} }
-// IEEEF32-PSZF64-DYNFULL: #[[$FUNCATTR]] = { {{.*}} "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-f32"="ieee,ieee" {{.*}} "target-cpu"="gfx803" {{.*}}  }
-// IEEEF32-PSZF64-DYNFULL: #[[$WEAK_FUNCATTR]] = { {{.*}} "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-f32"="ieee,ieee"  {{.*}} "target-cpu"="gfx803" {{.*}} }
+// IEEEF32-PSZF64-DYNFULL: #[[$KERNELATTR]] = { {{.*}} "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-f32"="ieee,ieee" {{.*}} }
+// IEEEF32-PSZF64-DYNFULL: #[[$FUNCATTR]] = { {{.*}} "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-f32"="ieee,ieee" {{.*}}  }
+// IEEEF32-PSZF64-DYNFULL: #[[$WEAK_FUNCATTR]] = { {{.*}} "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-f32"="ieee,ieee"  {{.*}} }
 
 
 // -mlink-bitcode-file doesn't internalize or propagate attributes.
-// NOINTERNALIZE-IEEEF32-PSZF64-DYNFULL: #[[$KERNELATTR]] = { {{.*}} "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-f32"="ieee,ieee" {{.*}} "target-cpu"="gfx803" {{.*}} }
+// NOINTERNALIZE-IEEEF32-PSZF64-DYNFULL: #[[$KERNELATTR]] = { {{.*}} "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-f32"="ieee,ieee" {{.*}} }
 // NOINTERNALIZE-IEEEF32-PSZF64-DYNFULL: #[[$FUNCATTR]] = { {{.*}} "denormal-fp-math"="dynamic,dynamic" {{.*}} }
 // NOINTERNALIZE-IEEEF32-PSZF64-DYNFULL: #[[$WEAK_FUNCATTR]] = { {{.*}} "denormal-fp-math"="dynamic,dynamic" {{.*}} }
Index: clang/test/CodeGenCUDA/Inputs/ocml-sample-target-attrs.cl
===================================================================
--- /dev/null
+++ clang/test/CodeGenCUDA/Inputs/ocml-sample-target-attrs.cl
@@ -0,0 +1,5 @@
+__attribute__((target("gfx11-insts")))
+unsigned do_intrin_stuff(void)
+{
+  return __builtin_amdgcn_s_sendmsg_rtnl(0x0);
+}
Index: clang/lib/CodeGen/CGCall.cpp
===================================================================
--- clang/lib/CodeGen/CGCall.cpp
+++ clang/lib/CodeGen/CGCall.cpp
@@ -2025,7 +2025,6 @@
   llvm::AttrBuilder FuncAttrs(F.getContext());
   getTrivialDefaultFunctionAttributes(F.getName(), F.hasOptNone(),
                                       /*AttrOnCallSite=*/false, FuncAttrs);
-  GetCPUAndFeaturesAttributes(GlobalDecl(), FuncAttrs);
 
   if (!WillInternalize && F.isInterposable()) {
     // Do not promote "dynamic" denormal-fp-math to this translation unit's
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to