Junchao,
I hope my latest MRs manages that for the current generation of those
values. If not, we need refinement.
Barry
> On Apr 5, 2021, at 9:30 PM, Junchao Zhang <[email protected]> wrote:
>
>
>
>
> On Mon, Apr 5, 2021 at 7:33 PM Jeff Hammond <[email protected]
> <mailto:[email protected]>> wrote:
> NVCC has supported multi-versioned "fat" binaries since I worked for Argonne.
> Libraries should figure out what the oldest hardware they are about is and
> then compile for everything from that point forward. Kepler (3.5) is oldest
> version any reasonable person should be thinking about at this point. The
> oldest thing I know of in the DOE HPC fleet is Pascal (6.x). Volta and
> Turing are 7.x and Ampere is 8.x.
>
> The biggest architectural changes came with unified memory
> (https://developer.nvidia.com/blog/unified-memory-in-cuda-6/
> <https://developer.nvidia.com/blog/unified-memory-in-cuda-6/>) and
> cooperative (https://developer.nvidia.com/blog/cooperative-groups/
> <https://developer.nvidia.com/blog/cooperative-groups/> in CUDA 9) but Kokkos
> doesn't use the latter. Both features can be used on quite old GPU
> architectures, although the performance is better on newer ones.
>
> I haven't dug into what Kokkos and PETSc are doing but the direct use of this
> stuff in CUDA is well-documented, certainly as well as the CPU switches for
> x86 binaries in the Intel compiler are.
>
> https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities
>
> <https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities>
>
> Devices with the same major revision number are of the same core
> architecture. The major revision number is 8 for devices based on the NVIDIA
> Ampere GPU architecture, 7 for devices based on the Volta architecture, 6 for
> devices based on the Pascal architecture, 5 for devices based on the Maxwell
> architecture, 3 for devices based on the Kepler architecture, 2 for devices
> based on the Fermi architecture, and 1 for devices based on the Tesla
> architecture.
> Kokkos has config options Kokkos_ARCH_TURING75, Kokkos_ARCH_VOLTA70,
> Kokkos_ARCH_VOLTA72. Any idea how one can map compute capability versions
> to arch names?
>
>
>
> https://docs.nvidia.com/cuda/pascal-compatibility-guide/index.html#building-pascal-compatible-apps-using-cuda-8-0
>
> <https://docs.nvidia.com/cuda/pascal-compatibility-guide/index.html#building-pascal-compatible-apps-using-cuda-8-0>
> https://docs.nvidia.com/cuda/volta-compatibility-guide/index.html#building-volta-compatible-apps-using-cuda-9-0
>
> <https://docs.nvidia.com/cuda/volta-compatibility-guide/index.html#building-volta-compatible-apps-using-cuda-9-0>
> https://docs.nvidia.com/cuda/turing-compatibility-guide/index.html#building-turing-compatible-apps-using-cuda-10-0
>
> <https://docs.nvidia.com/cuda/turing-compatibility-guide/index.html#building-turing-compatible-apps-using-cuda-10-0>
> https://docs.nvidia.com/cuda/ampere-compatibility-guide/index.html#building-ampere-compatible-apps-using-cuda-11-0
>
> <https://docs.nvidia.com/cuda/ampere-compatibility-guide/index.html#building-ampere-compatible-apps-using-cuda-11-0>
>
> Programmatic querying can be done with the following
> (https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html
> <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html>):
>
> cudaDeviceGetAttribute
> cudaDevAttrComputeCapabilityMajor
> <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg49e2f8c2c0bd6fe264f2fc970912e5cd220ff111a6616ab512e229d8f2f8bf87>:
> Major compute capability version number;
> cudaDevAttrComputeCapabilityMinor
> <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg49e2f8c2c0bd6fe264f2fc970912e5cd2c981c76c9de58d39502e483a7b484c7>:
> Minor compute capability version number;
> The compiler help tells me this, which can be cross-referenced with CUDA
> documentation above.
>
> $ /usr/local/cuda-10.0/bin/nvcc -h
>
> Usage : nvcc [options] <inputfile>
>
> ...
>
> Options for steering GPU code generation.
> =========================================
>
> --gpu-architecture <arch> (-arch)
> Specify the name of the class of NVIDIA 'virtual' GPU architecture
> for which
> the CUDA input files must be compiled.
> With the exception as described for the shorthand below, the
> architecture
> specified with this option must be a 'virtual' architecture (such as
> compute_50).
> Normally, this option alone does not trigger assembly of the
> generated PTX
> for a 'real' architecture (that is the role of nvcc option
> '--gpu-code',
> see below); rather, its purpose is to control preprocessing and
> compilation
> of the input to PTX.
> For convenience, in case of simple nvcc compilations, the following
> shorthand
> is supported. If no value for option '--gpu-code' is specified, then
> the
> value of this option defaults to the value of '--gpu-architecture'.
> In this
> situation, as only exception to the description above, the value
> specified
> for '--gpu-architecture' may be a 'real' architecture (such as a
> sm_50),
> in which case nvcc uses the specified 'real' architecture and its
> closest
> 'virtual' architecture as effective architecture values. For
> example, 'nvcc
> --gpu-architecture=sm_50' is equivalent to 'nvcc
> --gpu-architecture=compute_50
> --gpu-code=sm_50,compute_50'.
> Allowed values for this option:
> 'compute_30','compute_32','compute_35',
>
> 'compute_37','compute_50','compute_52','compute_53','compute_60','compute_61',
>
> 'compute_62','compute_70','compute_72','compute_75','sm_30','sm_32','sm_35',
>
> 'sm_37','sm_50','sm_52','sm_53','sm_60','sm_61','sm_62','sm_70','sm_72',
> 'sm_75'.
>
> --gpu-code <code>,... (-code)
> Specify the name of the NVIDIA GPU to assemble and optimize PTX for.
> nvcc embeds a compiled code image in the resulting executable for
> each specified
> <code> architecture, which is a true binary load image for each
> 'real' architecture
> (such as sm_50), and PTX code for the 'virtual' architecture (such as
> compute_50).
> During runtime, such embedded PTX code is dynamically compiled by the
> CUDA
> runtime system if no binary load image is found for the 'current' GPU.
> Architectures specified for options '--gpu-architecture' and
> '--gpu-code'
> may be 'virtual' as well as 'real', but the <code> architectures must
> be
> compatible with the <arch> architecture. When the '--gpu-code'
> option is
> used, the value for the '--gpu-architecture' option must be a
> 'virtual' PTX
> architecture.
> For instance, '--gpu-architecture=compute_35' is not compatible with
> '--gpu-code=sm_30',
> because the earlier compilation stages will assume the availability
> of 'compute_35'
> features that are not present on 'sm_30'.
> Allowed values for this option:
> 'compute_30','compute_32','compute_35',
>
> 'compute_37','compute_50','compute_52','compute_53','compute_60','compute_61',
>
> 'compute_62','compute_70','compute_72','compute_75','sm_30','sm_32','sm_35',
>
> 'sm_37','sm_50','sm_52','sm_53','sm_60','sm_61','sm_62','sm_70','sm_72',
> 'sm_75'.
>
> --generate-code <specification>,... (-gencode)
> This option provides a generalization of the
> '--gpu-architecture=<arch> --gpu-code=<code>,
> ...' option combination for specifying nvcc behavior with respect to
> code
> generation. Where use of the previous options generates code for
> different
> 'real' architectures with the PTX for the same 'virtual'
> architecture, option
> '--generate-code' allows multiple PTX generations for different
> 'virtual'
> architectures. In fact, '--gpu-architecture=<arch> --gpu-code=<code>,
> ...' is equivalent to '--generate-code arch=<arch>,code=<code>,...'.
> '--generate-code' options may be repeated for different virtual
> architectures.
> Allowed keywords for this option: 'arch','code'.
>
> On Mon, Apr 5, 2021 at 1:19 PM Satish Balay via petsc-dev
> <[email protected] <mailto:[email protected]>> wrote:
> This is nvidia mess-up. Why isn't there a command that give me these values
> [if they insist on this interface for nvcc]
>
> I see Barry want configure to do something here - but whatever we do - we
> would be shifting the problem around.
> [even if we detect stuff - build box might not have the GPU used for runs.]
>
> We have --with-cuda-arch - which I tried to remove from configure - but its
> come back in a different form (--with-cuda-gencodearch)
>
> And I see other packages:
>
> --with-kokkos-cuda-arch
>
> Wrt spack - I'm having to do:
>
> spack install xsdk+cuda ^magma cuda_arch=60
>
> [magma uses CudaPackage() infrastructure in spack]
>
> Satish
>
> On Mon, 5 Apr 2021, Mills, Richard Tran via petsc-dev wrote:
>
> > You raise a good point, Barry. I've been completely mystified by what some
> > of these names even mean. What does "PASCAL60" vs. "PASCAL61" even mean? Do
> > you know of where this is even documented? I can't really find anything
> > about it in the Kokkos documentation. The only thing I can really find is
> > an issue or two about "hey, shouldn't our CMake stuff figure this out
> > automatically" and then some posts about why it can't really do that. Not
> > encouraging.
> >
> > --Richard
> >
> > On 4/3/21 8:42 PM, Barry Smith wrote:
> >
> >
> > It would be very nice to NOT require PETSc users to provide this flag,
> > how the heck will they know what it should be when we cannot automate it
> > ourselves?
> >
> > Any ideas of how this can be determined based on the current system?
> > NVIDIA does not help since these "advertising" names don't seem to
> > trivially map to information you can get from a particular GPU when you
> > logged into it. For example nvidia-smi doesn't use these names directly. Is
> > there some mapping from nvidia-smi to these names we could use? If we are
> > serious about having a non-trivial number of users utilizing GPUs, which we
> > need to be for future, we cannot have this absurd demands in our
> > installation process.
> >
> > Barry
> >
> > Does spack have some magic for this we could use?
> >
> >
> >
> >
>
>
>
> --
> Jeff Hammond
> [email protected] <mailto:[email protected]>
> http://jeffhammond.github.io/ <http://jeffhammond.github.io/>