tra added a comment. In D106960#2925610 <https://reviews.llvm.org/D106960#2925610>, @ye-luo wrote:
> my second GPU is NVIDIA 3060Ti (sm_86) > I build my app daily with -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_80. > > About sm_80 binary able ot run on sm_86 > https://docs.nvidia.com/cuda/ampere-compatibility-guide/index.html#application-compatibility-on-ampere Keep in mind that the binaries compiled for sm_80 will likely run a lot slower on sm_86. sm_86 has distinctly different hardware and the code generated for sm_80 will be sub-optimal for it. I don't have the Ampere cards to compare, but sm_70 binaries running on sm_75 were reached only about 1/2 of the speed of the same code compiled for sm_75 when it was operating on fp16. NVIDIA didn't provide performance tuning guide for Ampere, but here's what it had to say about Volta/Turing: https://docs.nvidia.com/cuda/turing-tuning-guide/index.html#tensor-operations > Any binary compiled for Volta will run on Turing, but Volta binaries using > Tensor Cores will only be able to reach half of Turing's Tensor Core peak > performance. > Recompiling the binary specifically for Turing would allow it to reach the > peak performance. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106960/new/ https://reviews.llvm.org/D106960 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits