yaxunl created this revision.
yaxunl added reviewers: tra, rjmccall.

https://reviews.llvm.org/D77954 caused regressions due to diagnostics in 
implicit
host device functions.

The implicit host device functions are often functions in system headers forced 
to be device host by pragmas.

Some of them are valid host device functions that can be emitted in both host 
and device compilation.

Some of them are valid host functions but invalid device functions. In device 
compilation they incur
diagnostics. However as long as these diagnostics are deferred and these 
functions are not emitted
this is fine.

Before D77954 <https://reviews.llvm.org/D77954>, in host device callers, host 
device candidates are not favored against wrong-sided candidates,
which preserves the overloading resolution result as if the caller and the 
candidates are host functions.
This makes sure the callee does not cause other issues, e.g. type mismatch, 
const-ness issues, etc. If the
selected function is a host device function, then it is a viable callee. If the 
selected function is a host
function, then the caller is not a valid host device function, and it results 
in a diagnostic but it can be deferred.

The problem is that we have to give host device candidates equal preference 
with wrong-sided candidates. If
the users really intend to favor host device candidate against wrong-sided 
candidate, they cannot get the
expected selection.

Ideally we should be able to defer all diagnostics for functions not sure to be 
emitted. In that case we can
have correct preference. If diagnostics occur due to overloading resolution 
change, as long as the function
is not emitted, it is fine.

Unfortunately it is not a trivial work to defer all diagnostics. Even deferring 
only overloading resolution related
diagnostics is not a simple work.

For now, it seems the most feasible workaround is to treat implicit host device 
function and explicit host
device function differently. Basically for implicit host device functions, keep 
the old behavior, i.e. give
host device candidates and wrong-sided candidates equal preference. For 
explicit host device functions,
favor host device candidates against wrong-sided candidates.

The rationale is that explicit host device functions are blessed by the user to 
be valid host device functions,
that is, they should not cause diagnostics in both host and device compilation. 
If diagnostics occur, user is
able to fix them. However, there is no guarantee that implicit host device 
function can be compiled in
device compilation, therefore we need to preserve its overloading resolution in 
device compilation.


https://reviews.llvm.org/D79526

Files:
  clang/include/clang/Sema/Sema.h
  clang/lib/Sema/SemaCUDA.cpp
  clang/lib/Sema/SemaOverload.cpp
  clang/test/SemaCUDA/function-overload.cu

Index: clang/test/SemaCUDA/function-overload.cu
===================================================================
--- clang/test/SemaCUDA/function-overload.cu
+++ clang/test/SemaCUDA/function-overload.cu
@@ -463,3 +463,30 @@
 void foo() {
   __test<int>();
 }
+
+// Test resolving implicit host device candidate vs wrong-sided candidate.
+// Implicit host device caller choose implicit host device candidate and
+// wrong-sided candidate with equal preference.
+#ifdef __CUDA_ARCH__
+namespace ImplicitHostDeviceVsWrongSided {
+inline double callee(double x);
+#pragma clang force_cuda_host_device begin
+inline void callee(int x);
+inline double implicit_hd_caller() {
+  return callee(1.0);
+}
+#pragma clang force_cuda_host_device end
+}
+
+// Test resolving explicit host device candidate vs. wrong-sided candidate.
+// Explicit host device caller favors host device candidate against wrong-sided
+// candidate.
+namespace ExplicitHostDeviceVsWrongSided {
+inline double callee(double x);
+inline __host__ __device__ void callee(int x);
+inline __host__ __device__ double explicit_hd_caller() {
+  return callee(1.0);
+  // expected-error@-1 {{cannot initialize return object of type 'double' with an rvalue of type 'void'}}
+}
+}
+#endif
Index: clang/lib/Sema/SemaOverload.cpp
===================================================================
--- clang/lib/Sema/SemaOverload.cpp
+++ clang/lib/Sema/SemaOverload.cpp
@@ -9517,11 +9517,28 @@
   // in global variable initializers once proper context is added.
   if (S.getLangOpts().CUDA && Cand1.Function && Cand2.Function) {
     if (FunctionDecl *Caller = dyn_cast<FunctionDecl>(S.CurContext)) {
-      auto P1 = S.IdentifyCUDAPreference(Caller, Cand1.Function);
-      auto P2 = S.IdentifyCUDAPreference(Caller, Cand2.Function);
+      bool IsCallerImplicitHD = false;
+      bool IsCand1ImplicitHD = false;
+      bool IsCand2ImplicitHD = false;
+      S.IdentifyCUDATarget(Caller, /*IgnoreImplicitHD=*/false,
+                           &IsCallerImplicitHD);
+      auto P1 =
+          S.IdentifyCUDAPreference(Caller, Cand1.Function, &IsCand1ImplicitHD);
+      auto P2 =
+          S.IdentifyCUDAPreference(Caller, Cand2.Function, &IsCand2ImplicitHD);
       assert(P1 != Sema::CFP_Never && P2 != Sema::CFP_Never);
-      auto Cand1Emittable = P1 > Sema::CFP_WrongSide;
-      auto Cand2Emittable = P2 > Sema::CFP_WrongSide;
+      // The implicit HD function may be a function in a system header which
+      // is forced by pragma. If we prefer HD candidates over wrong-sided
+      // candidates, overloading resolution may change, which may result in
+      // non-deferrable diagnostics. As a workaround, we let implicit HD
+      // candidates take equal preference as wrong-sided candidates. This will
+      // preserve the overloading resolution.
+      auto EmitThreshold =
+          (IsCallerImplicitHD && (IsCand1ImplicitHD || IsCand2ImplicitHD))
+              ? Sema::CFP_HostDevice
+              : Sema::CFP_WrongSide;
+      auto Cand1Emittable = P1 > EmitThreshold;
+      auto Cand2Emittable = P2 > EmitThreshold;
       if (Cand1Emittable && !Cand2Emittable)
         return true;
       if (!Cand1Emittable && Cand2Emittable)
Index: clang/lib/Sema/SemaCUDA.cpp
===================================================================
--- clang/lib/Sema/SemaCUDA.cpp
+++ clang/lib/Sema/SemaCUDA.cpp
@@ -95,17 +95,25 @@
   return CFT_Host;
 }
 
-template <typename A>
-static bool hasAttr(const FunctionDecl *D, bool IgnoreImplicitAttr) {
-  return D->hasAttrs() && llvm::any_of(D->getAttrs(), [&](Attr *Attribute) {
-           return isa<A>(Attribute) &&
-                  !(IgnoreImplicitAttr && Attribute->isImplicit());
-         });
+template <typename AttrT>
+static bool hasAttr(const FunctionDecl *D, bool IgnoreImplicitAttr,
+                    bool *IsImplicitHDAttr = nullptr) {
+  if (auto *A = D->getAttr<AttrT>()) {
+    if (A->isImplicit()) {
+      if (IsImplicitHDAttr)
+        *IsImplicitHDAttr = true;
+      if (IgnoreImplicitAttr)
+        return false;
+    }
+    return true;
+  }
+  return false;
 }
 
 /// IdentifyCUDATarget - Determine the CUDA compilation target for this function
 Sema::CUDAFunctionTarget Sema::IdentifyCUDATarget(const FunctionDecl *D,
-                                                  bool IgnoreImplicitHDAttr) {
+                                                  bool IgnoreImplicitHDAttr,
+                                                  bool *IsImplicitHDAttr) {
   // Code that lives outside a function is run on the host.
   if (D == nullptr)
     return CFT_Host;
@@ -116,15 +124,23 @@
   if (D->hasAttr<CUDAGlobalAttr>())
     return CFT_Global;
 
-  if (hasAttr<CUDADeviceAttr>(D, IgnoreImplicitHDAttr)) {
-    if (hasAttr<CUDAHostAttr>(D, IgnoreImplicitHDAttr))
+  bool IsImplicitDevAttr = false;
+  bool IsImplicitHostAttr = false;
+  if (hasAttr<CUDADeviceAttr>(D, IgnoreImplicitHDAttr, &IsImplicitDevAttr)) {
+    if (hasAttr<CUDAHostAttr>(D, IgnoreImplicitHDAttr, &IsImplicitHostAttr)) {
+      assert(IsImplicitDevAttr == IsImplicitHostAttr);
+      if (IsImplicitHDAttr)
+        *IsImplicitHDAttr = IsImplicitDevAttr && IsImplicitHostAttr;
       return CFT_HostDevice;
+    }
     return CFT_Device;
   } else if (hasAttr<CUDAHostAttr>(D, IgnoreImplicitHDAttr)) {
     return CFT_Host;
   } else if (D->isImplicit() && !IgnoreImplicitHDAttr) {
     // Some implicit declarations (like intrinsic functions) are not marked.
     // Set the most lenient target on them for maximal flexibility.
+    if (IsImplicitHDAttr)
+      *IsImplicitHDAttr = true;
     return CFT_HostDevice;
   }
 
@@ -161,10 +177,12 @@
 
 Sema::CUDAFunctionPreference
 Sema::IdentifyCUDAPreference(const FunctionDecl *Caller,
-                             const FunctionDecl *Callee) {
+                             const FunctionDecl *Callee, bool *IsImplicitHD) {
   assert(Callee && "Callee must be valid.");
   CUDAFunctionTarget CallerTarget = IdentifyCUDATarget(Caller);
-  CUDAFunctionTarget CalleeTarget = IdentifyCUDATarget(Callee);
+  CUDAFunctionTarget CalleeTarget =
+      IdentifyCUDATarget(Callee,
+                         /*IgnoreImplicitHD=*/false, IsImplicitHD);
 
   // If one of the targets is invalid, the check always fails, no matter what
   // the other target is.
Index: clang/include/clang/Sema/Sema.h
===================================================================
--- clang/include/clang/Sema/Sema.h
+++ clang/include/clang/Sema/Sema.h
@@ -11656,7 +11656,8 @@
   /// Use this rather than examining the function's attributes yourself -- you
   /// will get it wrong.  Returns CFT_Host if D is null.
   CUDAFunctionTarget IdentifyCUDATarget(const FunctionDecl *D,
-                                        bool IgnoreImplicitHDAttr = false);
+                                        bool IgnoreImplicitHDAttr = false,
+                                        bool *IsImplicitHDAttr = nullptr);
   CUDAFunctionTarget IdentifyCUDATarget(const ParsedAttributesView &Attrs);
 
   /// Gets the CUDA target for the current context.
@@ -11683,9 +11684,12 @@
   ///               nullptr in case of global context.
   /// \param Callee target function
   ///
+  /// \param IsImplicitHD callee is an implicit host device function
+  ///
   /// \returns preference value for particular Caller/Callee combination.
   CUDAFunctionPreference IdentifyCUDAPreference(const FunctionDecl *Caller,
-                                                const FunctionDecl *Callee);
+                                                const FunctionDecl *Callee,
+                                                bool *IsImplicitHD = nullptr);
 
   /// Determines whether Caller may invoke Callee, based on their CUDA
   /// host/device attributes.  Returns false if the call is not allowed.
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to