hliao added inline comments.
================ Comment at: clang/lib/CodeGen/CodeGenPGO.cpp:839-840 + // Skip host-only functions in the CUDA device compilation and device-only + // functions in the host compilation. + if (CGM.getLangOpts().CUDA && ---------------- tra wrote: > hliao wrote: > > tra wrote: > > > hliao wrote: > > > > tra wrote: > > > > > We will still have around some functions that may never be used on > > > > > the host side (HD functions referenced from device code only). I'm > > > > > not sure if that's a problem for profiling, though. I wonder if we > > > > > can somehow tie `skipRegionMappingForDecl` to whether we've actually > > > > > codegen'ed the function. > > > > Skipping wrong-side functions here just makes the report not confusing > > > > as these functions are not emitted at all and are supposed never > > > > running on the host/device side. If we still create the mapping for > > > > them, e.g., we may report they have 0 runs instead of reporting nothing > > > > (just like comments between function.) That looks a little bit > > > > confusing. > > > > It seems the current PGO adds everything for coverage mapping and late > > > > prune them based on checks here. Just try to follow that logic to skip > > > > wrong-side functions. If we need to revise the original logic and > > > > generate coverage mapping for emitted functions only, the change here > > > > is unnecessary. > > > I'd add a comment here that this 'filter' is just a rough best-effort > > > approximation that still allows some effectively device-only Decls > > > through. > > > The output should still be correct, even though the functions will never > > > be used. Maybe add a TODO to deal with it if/when we know if the Decl was > > > codegen'ed. > > > > > Add that comment. But, I tend to not deal that "effectively" > > host-only/device-only ones as that should be developers' responsibility to > > handle them. The additional zero coverage mapping may be useful as well. If > > a function is really device-only but is attributed with HD, the 0 coverage > > may help developers correcting them. > It will be rather noisy in practice. A lot of code has either has been > written for NVCC or has to compile with it. NVCC does not have target > overloads, so sticking HD everywhere is pretty much the only practical way > to do it in complicated enough C++ code. Anything that uses Eigen or Thrust > will have tons of HD functions that are actually used only on one side. > Most HD interfaces in Eigen are designed to be used in both CPU and GPU. For GPU only ones, they are marked with `__device__` only. Thrust has a similar situation. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D85276/new/ https://reviews.llvm.org/D85276 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits