[PATCH] D124189: [CUDA][HIP] Externalize kernels with internal linkage

Artem Belevich via Phabricator via cfe-commits Fri, 22 Apr 2022 10:57:01 -0700

tra accepted this revision.
tra added a comment.
This revision is now accepted and ready to land.


LGTM overal, with few test nits.



================
Comment at: clang/lib/AST/ASTContext.cpp:12300
+         (D->hasAttr<CUDAGlobalAttr>() &&
+          GetGVALinkageForFunction(cast<FunctionDecl>(D),
+                                   /*IgnoreCUDAGlobalAttr=*/true) ==
----------------
yaxunl wrote:
> tra wrote:
> > yaxunl wrote:
> > > tra wrote:
> > > > yaxunl wrote:
> > > > > tra wrote:
> > > > > > Perhaps we don't need to change the public AST API and plumb 
> > > > > > `IgnoreCUDAGlobalAttr` through.
> > > > > > We cold create CUDA-aware static version of 
> > > > > > `GetGVALinkageForCudaKernel` instead, which would call 
> > > > > > `adjustGVALinkageForExternalDefinitionKind(..., 
> > > > > > adjustGVALinkageForAttributes(IgnoreCUDAGlobalAttr=true))`.
> > > > > We could have a static function but it would be 
> > > > > GetGVALinkageForCUDAKernelWithoutGlobalAttr since we need to know the 
> > > > > linkage of the kernel assuming it has no `__global__` attribute.
> > > > > 
> > > > > If you think it is OK I can make the change.
> > > > No point making public what's of no use to anybody other than this 
> > > > particular instance.
> > > > 
> > > > To think of it, we don't even need a function and could just do 
> > > > ```
> > > >   if (D->hasAttr<CUDAGlobalAttr>() ) {
> > > >     bool OriginalKernelLinkage = 
> > > > adjustGVALinkageForExternalDefinitionKind(..., 
> > > > adjustGVALinkageForAttributes(IgnoreCUDAGlobalAttr=true));
> > > >     return OriginalKernelLinkage == GVA_Internal;
> > > >   }
> > > >   return (IsStaticVar &&....)
> > > > ```
> > > > 
> > > > 
> > > One disadvantage of this approach is that it duplicates the code in 
> > > GetGVALinkageForFunction. In the future, if GetGVALinkageForFunction 
> > > changes, the same change needs to be applied to the duplicated code, 
> > > which is error-prone.
> > Good point.  Looking at the code closer, it appears that what we're 
> > interested in is whether the kernel was internal and *became* externally 
> > visible due to it being a kernel. 
> > 
> > Right now we're checking if the function would normally be `GVA_Internal` 
> > (shouldn't we have considered GVA_DiscardableODR, too, BTW?)
> > This is a somewhat indirect way of figuring out what we really need.
> > 
> > The code that determines what we want is essentially this code in 
> > adjustGVALinkageForAttributes that we're trying to enable/disable with 
> > `ConsiderCudaGlobalAttr`. 
> > 
> > It can be easily extracted into a static function, which could then be used 
> > from both `adjustGVALinkageForAttributes`, (which would no longer need 
> > `ConsiderCudaGlobalAttr`) and from here. 
> > 
> > ```
> > bool isInternalKernel(ASTContext *Context, Decl *D) {
> >   L=basicGVALinkageForFunction(Context, D);
> >   return (D->hasAttr<CUDAGlobalAttr>() &&
> >           (L == GVA_DiscardableODR || L == GVA_Internal));
> > }
> > ```
> > 
> > This would both avoid logic duplication and would better match our intent.
> > 
> > Does it make sense? Or did I miss something else?
> GVA_DiscardableODR  usually maps to linkonce_odr linkage in LLVM IR. It 
> follows the ODR, therefore we should not make them unique.
> 
> If we use isInternalKernel in adjustGVALinkageForAttributes, there will be 
> two calls of basicGVALinkageForFunction when GetGVALinkageForFunction is 
> called, which seems inefficient. I think we can keep GetGVALinkageForFunction 
> as it was, and use basicGVALinkageForFunction directly in mayExternalize.
SGTM.


================
Comment at: clang/test/CodeGenCUDA/device-var-linkage.cu:1-2
 // RUN: %clang_cc1 -no-opaque-pointers -triple nvptx -fcuda-is-device \
 // RUN:   -emit-llvm -o - -x hip %s \
 // RUN:   | FileCheck -check-prefixes=DEV,NORDC %s
----------------
This is odd -- the tests use `-x hip` and  `-triple nvptx`.

I think we need to change them  into HIP+amdgpu and CUDA +nvptx variants ans we 
now have language-dependent behavior here and are interested in the 
language/triple combinations that we do use in practice.


================
Comment at: clang/test/CodeGenCUDA/kernel-in-anon-ns.cu:3
 // RUN:   -aux-triple x86_64-unknown-linux-gnu -std=c++11 -fgpu-rdc \
 // RUN:   -emit-llvm -o - -x hip %s > %t.dev
 
----------------
We should have CUDA test variants here, too.


================
Comment at: clang/test/CodeGenCUDA/managed-var.cu:1
 // REQUIRES: x86-registered-target, amdgpu-registered-target
 
----------------
Tests above do not have REQUIRED. Is it needed here?



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124189/new/

https://reviews.llvm.org/D124189

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D124189: [CUDA][HIP] Externalize kernels with internal linkage

Reply via email to