tra added a comment.

In D106960#2925610 <https://reviews.llvm.org/D106960#2925610>, @ye-luo wrote:

> my second GPU is NVIDIA 3060Ti (sm_86)
> I build my app daily with -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_80.
>
> About sm_80 binary able ot run on sm_86
> https://docs.nvidia.com/cuda/ampere-compatibility-guide/index.html#application-compatibility-on-ampere

Keep in mind that the binaries compiled for sm_80 will likely run a lot slower 
on sm_86. sm_86 has distinctly different hardware and the code generated for 
sm_80 will be sub-optimal for it.
I don't have the Ampere cards to compare, but sm_70 binaries running on sm_75 
were reached only about 1/2 of the speed of the same code compiled for sm_75 
when it was operating on fp16.

NVIDIA didn't provide performance tuning guide for Ampere, but here's what it 
had to say about Volta/Turing:
https://docs.nvidia.com/cuda/turing-tuning-guide/index.html#tensor-operations

> Any binary compiled for Volta will run on Turing, but Volta binaries using 
> Tensor Cores will only be able to reach half of Turing's Tensor Core peak 
> performance. 
> Recompiling the binary specifically for Turing would allow it to reach the 
> peak performance.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106960/new/

https://reviews.llvm.org/D106960

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to