hliao added inline comments.

================
Comment at: clang/lib/CodeGen/CodeGenPGO.cpp:839-840
 
+  // Skip host-only functions in the CUDA device compilation and device-only
+  // functions in the host compilation.
+  if (CGM.getLangOpts().CUDA &&
----------------
tra wrote:
> hliao wrote:
> > tra wrote:
> > > hliao wrote:
> > > > tra wrote:
> > > > > We will still have around some functions that may never be used on 
> > > > > the host side (HD functions referenced from device code only).  I'm 
> > > > > not sure if that's a problem for profiling, though. I wonder if we 
> > > > > can somehow tie `skipRegionMappingForDecl` to whether we've actually 
> > > > > codegen'ed the function. 
> > > > Skipping wrong-side functions here just makes the report not confusing 
> > > > as these functions are not emitted at all and are supposed never 
> > > > running on the host/device side. If we still create the mapping for 
> > > > them, e.g., we may report they have 0 runs instead of reporting nothing 
> > > > (just like comments between function.) That looks a little bit 
> > > > confusing.
> > > > It seems the current PGO adds everything for coverage mapping and late 
> > > > prune them based on checks here. Just try to follow that logic to skip 
> > > > wrong-side functions. If we need to revise the original logic and 
> > > > generate coverage mapping for emitted functions only, the change here 
> > > > is unnecessary.
> > > I'd add a comment here that this 'filter' is just a rough best-effort 
> > > approximation that still allows some effectively device-only Decls 
> > > through.
> > > The output should still be correct, even though the functions will never 
> > > be used. Maybe add a TODO to deal with it if/when we know if the Decl was 
> > > codegen'ed.
> > > 
> > Add that comment. But, I tend to not deal that "effectively" 
> > host-only/device-only ones as that should be developers' responsibility to 
> > handle them. The additional zero coverage mapping may be useful as well. If 
> > a function is really device-only but is attributed with HD, the 0 coverage 
> > may help developers correcting them.
> It will be rather noisy in practice. A lot of code has either has been 
> written for NVCC or has to compile with it. NVCC does not have target 
> overloads, so sticking HD everywhere  is pretty much the only practical way 
> to do it in complicated enough C++ code.  Anything that uses Eigen or Thrust 
> will have tons of HD functions that are actually used only on one side. 
> 
Most HD interfaces in Eigen are designed to be used in both CPU and GPU. For 
GPU only ones, they are marked with `__device__` only. Thrust has a similar 
situation.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85276/new/

https://reviews.llvm.org/D85276

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to