Did you build UCX with CUDA support (--with-cuda) ? Josh
On Thu, Sep 5, 2019 at 8:45 PM AFernandez via users < users@lists.open-mpi.org> wrote: > Hello OpenMPI Team, > > I'm trying to use CUDA-aware OpenMPI but the system simply ignores the GPU > and the code runs on the CPUs. I've tried different software but will focus > on the OSU benchmarks (collective and pt2pt communications). Let me provide > some data about the configuration of the system: > > -OFED v4.17-1-rc2 (the NIC is virtualized but I also tried a Mellanox card > with MOFED a few days ago and found the same issue) > > -CUDA v10.1 > > -gdrcopy v1.3 > > -UCX 1.6.0 > > -OpenMPI 4.0.1 > > Everything looks like good (CUDA programs work fine, MPI programs run on > the CPUs without any problem), and the ompi_info outputs what I was > expecting (but maybe I'm missing something): > > > mca:opal:base:param:opal_built_with_cuda_support:synonym:name:mpi_built_with_cuda_support > > mca:mpi:base:param:mpi_built_with_cuda_support:value:true > > mca:mpi:base:param:mpi_built_with_cuda_support:source:default > > mca:mpi:base:param:mpi_built_with_cuda_support:status:read-only > > mca:mpi:base:param:mpi_built_with_cuda_support:level:4 > > mca:mpi:base:param:mpi_built_with_cuda_support:help:Whether CUDA GPU > buffer support is built into library or not > > mca:mpi:base:param:mpi_built_with_cuda_support:enumerator:value:0:false > > mca:mpi:base:param:mpi_built_with_cuda_support:enumerator:value:1:true > > mca:mpi:base:param:mpi_built_with_cuda_support:deprecated:no > > mca:mpi:base:param:mpi_built_with_cuda_support:type:bool > > > mca:mpi:base:param:mpi_built_with_cuda_support:synonym_of:name:opal_built_with_cuda_support > > mca:mpi:base:param:mpi_built_with_cuda_support:disabled:false > > The available btls are the usual self, openib, tcp & vader plus smcuda, > uct & usnic. The full output from ompi_info is attached. If I try the flag > '--mca opal_cuda_verbose 10,' it doesn't output anything, which seems to > agree with the lack of GPU use. If I try with '--mca btl smcuda,' it makes > no difference. I have also tried to specify the program to use host and > device (e.g. mpirun -np 2 ./osu_latency D H) but the same result. I am > probably missing something but not sure where else to look at or what else > to try. > > Thank you, > > AFernandez > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users