hdelan added inline comments.
================ Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:433 } else { - // If no -O was passed, pass -O0 to ptxas -- no opt flag should correspond - // to no optimizations, but ptxas's default is -O3. - CmdArgs.push_back("-O0"); + // If no -O was passed, pass -O3 to ptxas -- this makes ptxas's + // optimization level the same as the ptxjitcompiler. ---------------- tra wrote: > hdelan wrote: > > tra wrote: > > > I think this would be contrary to the expectation that lack of `-O` in > > > clang means - `do not optimize` and it generally implies the whole > > > compilation chain, including assembler. Matching whatever nvidia tools do > > > is an insufficient reason for breaking this assumption, IMO. > > > > > > If you do want do run optimized ptxas on unoptimized PTX, you can use > > > `-Xcuda-ptxas -O3`. > > I think for the average user, consistency across the `ptxjitcompiler` and > > `ptxas` is far more important than assuming that no `-O` means no > > optimization. I think most users will assume that no `-O` will assume that > > whatever tools being used will take their default optimization level, which > > in the case of clang is `-O0` and in the case of `ptxas` is `-O3`. > > > > We have had a few bugs with `ptxas`/`ptxjitcompiler` at higher optimization > > levels, which were quite hard to pin down since offline `ptxas` and > > `ptxjitcompiler` were using different optimisation levels, making bugs > > appear in one and not the other. Of course we are aware of this now but > > this inconsistency can result in bugs that are difficult to diagnose. > > Having consistency between the `ptxjitcompiler` and `ptxas` is therefore of > > practical benefit. Whereas if we are to leave it as is, with `ptxas` > > defaulting to `-O0`, the benefit is purely semantic and not practical. > > I think for the average user, consistency across the ptxjitcompiler and > > ptxas is far more important than assuming that no -O means no optimization. > > The default is intended to provide the least amount of surprises for the most > users. There are more users of clang as a CUDA compiler than users of clang > as a cuda compiler who care about consistency with ptxjitcompiler. My point > is that the improvements for a subset of users should be balanced vs > usability in the common case. In this case the benefit does not justify the > downsides, IMO. > > Please add me as a reviewer when the patch is ready for public review and > we'll discuss it in a wider LLVM community. We have come to the same conclusion that it is best to leave this unchanged upstream. However this change has been made locally in `intel/llvm`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D116583/new/ https://reviews.llvm.org/D116583 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits