Re: [petsc-dev] -with-kokkos-cuda-arch=AMPERE80 nonsense

Barry Smith Mon, 05 Apr 2021 20:43:35 -0700

  Junchao,

    I hope my latest MRs  manages that for the current generation of those 
values. If not, we need refinement.


  Barry


> On Apr 5, 2021, at 9:30 PM, Junchao Zhang <[email protected]> wrote:
> 
> 
> 
> 
> On Mon, Apr 5, 2021 at 7:33 PM Jeff Hammond <[email protected] 
> <mailto:[email protected]>> wrote:
> NVCC has supported multi-versioned "fat" binaries since I worked for Argonne. 
>  Libraries should figure out what the oldest hardware they are about is and 
> then compile for everything from that point forward.  Kepler (3.5) is oldest 
> version any reasonable person should be thinking about at this point.  The 
> oldest thing I know of in the DOE HPC fleet is Pascal (6.x).  Volta and 
> Turing are 7.x and Ampere is 8.x.
> 
> The biggest architectural changes came with unified memory 
> (https://developer.nvidia.com/blog/unified-memory-in-cuda-6/ 
> <https://developer.nvidia.com/blog/unified-memory-in-cuda-6/>) and 
> cooperative (https://developer.nvidia.com/blog/cooperative-groups/ 
> <https://developer.nvidia.com/blog/cooperative-groups/> in CUDA 9) but Kokkos 
> doesn't use the latter.  Both features can be used on quite old GPU 
> architectures, although the performance is better on newer ones.
> 
> I haven't dug into what Kokkos and PETSc are doing but the direct use of this 
> stuff in CUDA is well-documented, certainly as well as the CPU switches for 
> x86 binaries in the Intel compiler are.
> 
> https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities
>  
> <https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities>
> 
> Devices with the same major revision number are of the same core 
> architecture. The major revision number is 8 for devices based on the NVIDIA 
> Ampere GPU architecture, 7 for devices based on the Volta architecture, 6 for 
> devices based on the Pascal architecture, 5 for devices based on the Maxwell 
> architecture, 3 for devices based on the Kepler architecture, 2 for devices 
> based on the Fermi architecture, and 1 for devices based on the Tesla 
> architecture.
> Kokkos has config options Kokkos_ARCH_TURING75, Kokkos_ARCH_VOLTA70, 
> Kokkos_ARCH_VOLTA72.    Any idea how one can map compute capability versions 
> to arch names?
>  
> 
> 
> https://docs.nvidia.com/cuda/pascal-compatibility-guide/index.html#building-pascal-compatible-apps-using-cuda-8-0
>  
> <https://docs.nvidia.com/cuda/pascal-compatibility-guide/index.html#building-pascal-compatible-apps-using-cuda-8-0>
> https://docs.nvidia.com/cuda/volta-compatibility-guide/index.html#building-volta-compatible-apps-using-cuda-9-0
>  
> <https://docs.nvidia.com/cuda/volta-compatibility-guide/index.html#building-volta-compatible-apps-using-cuda-9-0>
> https://docs.nvidia.com/cuda/turing-compatibility-guide/index.html#building-turing-compatible-apps-using-cuda-10-0
>  
> <https://docs.nvidia.com/cuda/turing-compatibility-guide/index.html#building-turing-compatible-apps-using-cuda-10-0>
> https://docs.nvidia.com/cuda/ampere-compatibility-guide/index.html#building-ampere-compatible-apps-using-cuda-11-0
>  
> <https://docs.nvidia.com/cuda/ampere-compatibility-guide/index.html#building-ampere-compatible-apps-using-cuda-11-0>
> 
> Programmatic querying can be done with the following 
> (https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html 
> <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html>):
> 
> cudaDeviceGetAttribute
> cudaDevAttrComputeCapabilityMajor 
> <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg49e2f8c2c0bd6fe264f2fc970912e5cd220ff111a6616ab512e229d8f2f8bf87>:
>  Major compute capability version number;
> cudaDevAttrComputeCapabilityMinor 
> <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg49e2f8c2c0bd6fe264f2fc970912e5cd2c981c76c9de58d39502e483a7b484c7>:
>  Minor compute capability version number;
> The compiler help tells me this, which can be cross-referenced with CUDA 
> documentation above.
> 
> $ /usr/local/cuda-10.0/bin/nvcc -h
> 
> Usage  : nvcc [options] <inputfile>
> 
> ...
> 
> Options for steering GPU code generation.
> =========================================
> 
> --gpu-architecture <arch>                  (-arch)                         
>         Specify the name of the class of NVIDIA 'virtual' GPU architecture 
> for which
>         the CUDA input files must be compiled.
>         With the exception as described for the shorthand below, the 
> architecture
>         specified with this option must be a 'virtual' architecture (such as 
> compute_50).
>         Normally, this option alone does not trigger assembly of the 
> generated PTX
>         for a 'real' architecture (that is the role of nvcc option 
> '--gpu-code',
>         see below); rather, its purpose is to control preprocessing and 
> compilation
>         of the input to PTX.
>         For convenience, in case of simple nvcc compilations, the following 
> shorthand
>         is supported.  If no value for option '--gpu-code' is specified, then 
> the
>         value of this option defaults to the value of '--gpu-architecture'.  
> In this
>         situation, as only exception to the description above, the value 
> specified
>         for '--gpu-architecture' may be a 'real' architecture (such as a 
> sm_50),
>         in which case nvcc uses the specified 'real' architecture and its 
> closest
>         'virtual' architecture as effective architecture values.  For 
> example, 'nvcc
>         --gpu-architecture=sm_50' is equivalent to 'nvcc 
> --gpu-architecture=compute_50
>         --gpu-code=sm_50,compute_50'.
>         Allowed values for this option:  
> 'compute_30','compute_32','compute_35',
>         
> 'compute_37','compute_50','compute_52','compute_53','compute_60','compute_61',
>         
> 'compute_62','compute_70','compute_72','compute_75','sm_30','sm_32','sm_35',
>         
> 'sm_37','sm_50','sm_52','sm_53','sm_60','sm_61','sm_62','sm_70','sm_72',
>         'sm_75'.
> 
> --gpu-code <code>,...                      (-code)                         
>         Specify the name of the NVIDIA GPU to assemble and optimize PTX for.
>         nvcc embeds a compiled code image in the resulting executable for 
> each specified
>         <code> architecture, which is a true binary load image for each 
> 'real' architecture
>         (such as sm_50), and PTX code for the 'virtual' architecture (such as 
> compute_50).
>         During runtime, such embedded PTX code is dynamically compiled by the 
> CUDA
>         runtime system if no binary load image is found for the 'current' GPU.
>         Architectures specified for options '--gpu-architecture' and 
> '--gpu-code'
>         may be 'virtual' as well as 'real', but the <code> architectures must 
> be
>         compatible with the <arch> architecture.  When the '--gpu-code' 
> option is
>         used, the value for the '--gpu-architecture' option must be a 
> 'virtual' PTX
>         architecture.
>         For instance, '--gpu-architecture=compute_35' is not compatible with 
> '--gpu-code=sm_30',
>         because the earlier compilation stages will assume the availability 
> of 'compute_35'
>         features that are not present on 'sm_30'.
>         Allowed values for this option:  
> 'compute_30','compute_32','compute_35',
>         
> 'compute_37','compute_50','compute_52','compute_53','compute_60','compute_61',
>         
> 'compute_62','compute_70','compute_72','compute_75','sm_30','sm_32','sm_35',
>         
> 'sm_37','sm_50','sm_52','sm_53','sm_60','sm_61','sm_62','sm_70','sm_72',
>         'sm_75'.
> 
> --generate-code <specification>,...        (-gencode)                      
>         This option provides a generalization of the 
> '--gpu-architecture=<arch> --gpu-code=<code>,
>         ...' option combination for specifying nvcc behavior with respect to 
> code
>         generation.  Where use of the previous options generates code for 
> different
>         'real' architectures with the PTX for the same 'virtual' 
> architecture, option
>         '--generate-code' allows multiple PTX generations for different 
> 'virtual'
>         architectures.  In fact, '--gpu-architecture=<arch> --gpu-code=<code>,
>         ...' is equivalent to '--generate-code arch=<arch>,code=<code>,...'.
>         '--generate-code' options may be repeated for different virtual 
> architectures.
>         Allowed keywords for this option:  'arch','code'.
> 
> On Mon, Apr 5, 2021 at 1:19 PM Satish Balay via petsc-dev 
> <[email protected] <mailto:[email protected]>> wrote:
> This is nvidia mess-up. Why isn't there a command that give me these values 
> [if they insist on this interface for nvcc]
> 
> I see Barry want configure to do something here - but whatever we do - we 
> would be shifting the problem around.
> [even if we detect stuff - build box might not have the GPU used for runs.]
> 
> We have --with-cuda-arch - which I tried to remove from configure - but its 
> come back in a different form (--with-cuda-gencodearch)
> 
> And I see other packages:
> 
>   --with-kokkos-cuda-arch
> 
> Wrt spack - I'm having to do:
> 
> spack install xsdk+cuda ^magma cuda_arch=60
> 
> [magma uses CudaPackage() infrastructure in spack]
> 
> Satish
> 
> On Mon, 5 Apr 2021, Mills, Richard Tran via petsc-dev wrote:
> 
> > You raise a good point, Barry. I've been completely mystified by what some 
> > of these names even mean. What does "PASCAL60" vs. "PASCAL61" even mean? Do 
> > you know of where this is even documented? I can't really find anything 
> > about it in the Kokkos documentation. The only thing I can really find is 
> > an issue or two about "hey, shouldn't our CMake stuff figure this out 
> > automatically" and then some posts about why it can't really do that. Not 
> > encouraging.
> > 
> > --Richard
> > 
> > On 4/3/21 8:42 PM, Barry Smith wrote:
> > 
> > 
> >   It would be very nice to NOT require PETSc users to provide this flag, 
> > how the heck will they know what it should be when we cannot automate it 
> > ourselves?
> > 
> >   Any ideas of how this can be determined based on the current system? 
> > NVIDIA does not help since these "advertising" names don't seem to 
> > trivially map to information you can get from a particular GPU when you 
> > logged into it. For example nvidia-smi doesn't use these names directly. Is 
> > there some mapping from nvidia-smi  to these names we could use? If we are 
> > serious about having a non-trivial number of users utilizing GPUs, which we 
> > need to be for future, we cannot have this absurd demands in our 
> > installation process.
> > 
> >   Barry
> > 
> > Does spack have some magic for this we could use?
> > 
> > 
> > 
> > 
> 
> 
> 
> -- 
> Jeff Hammond
> [email protected] <mailto:[email protected]>
> http://jeffhammond.github.io/ <http://jeffhammond.github.io/>

Re: [petsc-dev] -with-kokkos-cuda-arch=AMPERE80 nonsense

Reply via email to