[PATCH] D116583: Change the default optimisation level of PTXAS from -O0 to -O3. This makes the optimisation levels of PTXAS and the ptxjitcompiler equal (ptxjitcompiler defaults to -O3).

Hugh Delaney via Phabricator via cfe-commits Mon, 31 Jan 2022 01:56:46 -0800

hdelan added inline comments.


================
Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:433
   } else {
-    // If no -O was passed, pass -O0 to ptxas -- no opt flag should correspond
-    // to no optimizations, but ptxas's default is -O3.
-    CmdArgs.push_back("-O0");
+    // If no -O was passed, pass -O3 to ptxas -- this makes ptxas's
+    // optimization level the same as the ptxjitcompiler.
----------------
tra wrote:
> hdelan wrote:
> > tra wrote:
> > > I think this would be contrary to the expectation that lack of `-O` in 
> > > clang means - `do not optimize` and it generally implies the whole 
> > > compilation chain, including assembler. Matching whatever nvidia tools do 
> > > is an insufficient reason for breaking this assumption, IMO. 
> > > 
> > > If you do want do run optimized ptxas on unoptimized PTX, you can use 
> > > `-Xcuda-ptxas -O3`.
> > I think for the average user, consistency across the `ptxjitcompiler` and 
> > `ptxas` is far more important than assuming that no `-O` means no 
> > optimization. I think most users will assume that no `-O` will assume that 
> > whatever tools being used will take their default optimization level, which 
> > in the case of clang is `-O0` and in the case of `ptxas` is `-O3`. 
> > 
> > We have had a few bugs with `ptxas`/`ptxjitcompiler` at higher optimization 
> > levels, which were quite hard to pin down since offline `ptxas` and 
> > `ptxjitcompiler` were using different optimisation levels, making bugs 
> > appear in one and not the other. Of course we are aware of this now but 
> > this inconsistency can result in bugs that are difficult to diagnose. 
> > Having consistency between the `ptxjitcompiler` and `ptxas` is therefore of 
> > practical benefit. Whereas if we are to leave it as is, with `ptxas` 
> > defaulting to `-O0`, the benefit is purely semantic and not practical.
> > I think for the average user, consistency across the ptxjitcompiler and 
> > ptxas is far more important than assuming that no -O means no optimization. 
> 
> The default is intended to provide the least amount of surprises for the most 
> users. There are more users of clang as a CUDA compiler than users of clang 
> as a cuda compiler who care about consistency with ptxjitcompiler. My point 
> is that the improvements for a subset of users should be balanced vs 
> usability in the common case. In this case the benefit does not justify the 
> downsides, IMO.
> 
> Please add me as a reviewer when the patch is ready for public review and 
> we'll discuss it in a wider LLVM community.
We have come to the same conclusion that it is best to leave this unchanged 
upstream. However this change has been made locally in `intel/llvm`. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116583/new/

https://reviews.llvm.org/D116583

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D116583: Change the default optimisation level of PTXAS from -O0 to -O3. This makes the optimisation levels of PTXAS and the ptxjitcompiler equal (ptxjitcompiler defaults to -O3).

Reply via email to