On Tue, Apr 6, 2021 at 2:08 PM Scott Kruger <[email protected]> wrote:
> > I wrote sent this yesterday but am having some strange mailing issues. > > On 2021-04-03 22:42, Barry Smith did write: > > > > It would be very nice to NOT require PETSc users to provide this flag, > how the heck will they know what it should be when we cannot automate it > ourselves? > > > > Any ideas of how this can be determined based on the current system? > NVIDIA does not help since these "advertising" names don't seem to > trivially map to information you can get from a particular GPU when you > logged into it. For example nvidia-smi doesn't use these names directly. Is > there some mapping from nvidia-smi to these names we could use? If we are > serious about having a non-trivial number of users utilizing GPUs, which we > need to be for future, we cannot have this absurd demands in our > installation process. > > The mapping of the Nvidia card to the gencodes and cuda arch is one of > those annoyances that is so ridiculous it is hard to believe. > The best reference I have found is this: > > https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/ > > To this end, the fact that Kokkos provides a mapping from colloquial > card name to gencode/arch is a real benefit and useful. The problem is > that this mapping is buried in their build system and lacks > introspection. > > > > > Barry > > > > Does spack have some magic for this we could use? > > > > spack developed the archspec repo to abstract all of these issues: > https://github.com/archspec/archspec I do not love it. Besides the actual code (you can always complain about code), they do not really do any tests. They go look in a few places that data should be. We can do the same thing in probably 10x less code. It would be great to actually test the hardware to verify. Thanks, Matt > This is a *great* idea and eventually BuildSystem should incorporate it as > the standard way of doing things; however, it is been focused mostly on > the CPU issues, and is still under active development (my understanding > is that the pulling it out of spack and getting those interop issues > sorted out is tangled up in how spack handles dependencies and > compilers). It'd be nice if someone would go in and port the Kokkos gpu > mappings to archspec as there is some great knowledge on these mapping > buried in the Kokkos build system (not volunteering); i.e., translating > that webpage to some real code (even if it is in make) is valuable. > > TL;DR: It's a known problem with currently no good solution AFAIK. > Waiting until archspec gets further along seems like the best solution. > > Scott > > P.S. ROCm has rocminfo which also doesn't solve the problem but is at > least sane. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
