tra accepted this revision. tra added a comment. This revision is now accepted and ready to land.
LGTM overal, with few test nits. ================ Comment at: clang/lib/AST/ASTContext.cpp:12300 + (D->hasAttr<CUDAGlobalAttr>() && + GetGVALinkageForFunction(cast<FunctionDecl>(D), + /*IgnoreCUDAGlobalAttr=*/true) == ---------------- yaxunl wrote: > tra wrote: > > yaxunl wrote: > > > tra wrote: > > > > yaxunl wrote: > > > > > tra wrote: > > > > > > Perhaps we don't need to change the public AST API and plumb > > > > > > `IgnoreCUDAGlobalAttr` through. > > > > > > We cold create CUDA-aware static version of > > > > > > `GetGVALinkageForCudaKernel` instead, which would call > > > > > > `adjustGVALinkageForExternalDefinitionKind(..., > > > > > > adjustGVALinkageForAttributes(IgnoreCUDAGlobalAttr=true))`. > > > > > We could have a static function but it would be > > > > > GetGVALinkageForCUDAKernelWithoutGlobalAttr since we need to know the > > > > > linkage of the kernel assuming it has no `__global__` attribute. > > > > > > > > > > If you think it is OK I can make the change. > > > > No point making public what's of no use to anybody other than this > > > > particular instance. > > > > > > > > To think of it, we don't even need a function and could just do > > > > ``` > > > > if (D->hasAttr<CUDAGlobalAttr>() ) { > > > > bool OriginalKernelLinkage = > > > > adjustGVALinkageForExternalDefinitionKind(..., > > > > adjustGVALinkageForAttributes(IgnoreCUDAGlobalAttr=true)); > > > > return OriginalKernelLinkage == GVA_Internal; > > > > } > > > > return (IsStaticVar &&....) > > > > ``` > > > > > > > > > > > One disadvantage of this approach is that it duplicates the code in > > > GetGVALinkageForFunction. In the future, if GetGVALinkageForFunction > > > changes, the same change needs to be applied to the duplicated code, > > > which is error-prone. > > Good point. Looking at the code closer, it appears that what we're > > interested in is whether the kernel was internal and *became* externally > > visible due to it being a kernel. > > > > Right now we're checking if the function would normally be `GVA_Internal` > > (shouldn't we have considered GVA_DiscardableODR, too, BTW?) > > This is a somewhat indirect way of figuring out what we really need. > > > > The code that determines what we want is essentially this code in > > adjustGVALinkageForAttributes that we're trying to enable/disable with > > `ConsiderCudaGlobalAttr`. > > > > It can be easily extracted into a static function, which could then be used > > from both `adjustGVALinkageForAttributes`, (which would no longer need > > `ConsiderCudaGlobalAttr`) and from here. > > > > ``` > > bool isInternalKernel(ASTContext *Context, Decl *D) { > > L=basicGVALinkageForFunction(Context, D); > > return (D->hasAttr<CUDAGlobalAttr>() && > > (L == GVA_DiscardableODR || L == GVA_Internal)); > > } > > ``` > > > > This would both avoid logic duplication and would better match our intent. > > > > Does it make sense? Or did I miss something else? > GVA_DiscardableODR usually maps to linkonce_odr linkage in LLVM IR. It > follows the ODR, therefore we should not make them unique. > > If we use isInternalKernel in adjustGVALinkageForAttributes, there will be > two calls of basicGVALinkageForFunction when GetGVALinkageForFunction is > called, which seems inefficient. I think we can keep GetGVALinkageForFunction > as it was, and use basicGVALinkageForFunction directly in mayExternalize. SGTM. ================ Comment at: clang/test/CodeGenCUDA/device-var-linkage.cu:1-2 // RUN: %clang_cc1 -no-opaque-pointers -triple nvptx -fcuda-is-device \ // RUN: -emit-llvm -o - -x hip %s \ // RUN: | FileCheck -check-prefixes=DEV,NORDC %s ---------------- This is odd -- the tests use `-x hip` and `-triple nvptx`. I think we need to change them into HIP+amdgpu and CUDA +nvptx variants ans we now have language-dependent behavior here and are interested in the language/triple combinations that we do use in practice. ================ Comment at: clang/test/CodeGenCUDA/kernel-in-anon-ns.cu:3 // RUN: -aux-triple x86_64-unknown-linux-gnu -std=c++11 -fgpu-rdc \ // RUN: -emit-llvm -o - -x hip %s > %t.dev ---------------- We should have CUDA test variants here, too. ================ Comment at: clang/test/CodeGenCUDA/managed-var.cu:1 // REQUIRES: x86-registered-target, amdgpu-registered-target ---------------- Tests above do not have REQUIRED. Is it needed here? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D124189/new/ https://reviews.llvm.org/D124189 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits