yaxunl added a comment.

need a lit test for the codegen of the clang builtin for cov 4/5/none and a lit 
test to show the branching code generated with cov none can be optimized away 
when linked with cov4 or cov5.



================
Comment at: clang/lib/CodeGen/Targets/AMDGPU.cpp:383
+            CGM.getTarget().getTargetOpts().CodeObjectVersion, /*Size=*/32,
+            llvm::GlobalValue::WeakODRLinkage);
+}
----------------

I am not sure weak_odr linkage will work when code object version is none. This 
will cause conflict when a module emitted with cov none is linked with a module 
emitted with cov4 or cov5. Also, when all modules are emitted with cov none, we 
end up with a linked module with cov none and the work group size code will not 
work.

Probably we need to emit llvm.amdgcn.abi.version with external linkage for cov 
none.

Another issue is that llvm.amdgcn.abi.version is not internalized. It is always 
loaded from memory even though it is in constant address space. This will cause 
bad performance. Considering device libs may use clang builtin for workgroup 
size. The performance impact may be significant. To avoid performance 
degradation, we need to internalize it as early as possible in the optimization 
pipeline.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to