Hi Akshay,

I'm building both UCX and OpenMPI as you mention. The portions of the script 
read:

./configure --prefix=/usr/local/ucx-cuda-install 
--with-cuda=/usr/local/cuda-10.1  --with-gdrcopy=/home/odyhpc/gdrcopy 
--disable-numa

sudo make install

&

./configure --with-cuda=/usr/local/cuda-10.1 
--with-cuda-libdir=/usr/local/cuda-10.1/lib64 
--with-ucx=/usr/local/ucx-cuda-install --prefix=/opt/openmpi

sudo make all install

As far as the job submission, I have tried several combinations with different 
MCAs (yesterday I forgot to include '--mca pml ucx' flag as it had made no 
difference in the past). I just tried your suggested syntax (mpirun -np 2 --mca 
pml ucx --mca btl ^smcuda,openib ./osu_latency D H) with the same results. The 
latency times are of the same order no matter which flags I include. As far as 
checking GPU usage, I'm not familiar with 'nvprof' and simply using the basic 
continuous info (nvidia-smi -l). I'm trying all of this in a cloud environment, 
and my suspicion is that there might be some interference (maybe because of 
some virtualization component) but cannot pinpoint the cause.

Thanks,

Arturo

 

From: Akshay Venkatesh <akshay.v.3...@gmail.com> 
Sent: Friday, September 06, 2019 11:14 AM
To: Open MPI Users <users@lists.open-mpi.org>
Cc: Joshua Ladd <jladd.m...@gmail.com>; Arturo Fernandez <afernan...@odyhpc.com>
Subject: Re: [OMPI users] CUDA-aware codes not using GPU

 

Hi, Arturo.

 

Usually, for OpenMPI+UCX we use the following recipe 

 

for UCX:

 
./configure --prefix=/path/to/ucx-cuda-install --with-cuda=/usr/local/cuda 
--with-gdrcopy=/usr
 
make -j install


then OpenMPI:

 

./configure --with-cuda=/usr/local/cuda --with-ucx=/path/to/ucx-cuda-install
 
make -j install
 

Can you run with the following to see if it helps: 

 
mpirun -np 2 --mca pml ucx --mca btl ^smcuda,openib ./osu_latency D H

There are details here that may be useful: 
https://www.open-mpi.org/faq/?category=runcuda#run-ompi-cuda-ucx  

 

Also, note that for short messages D->H path for inter-node may not involve 
call CUDA API (if you're using nvprof to detect CUDA activity) because 
GPUDirectRDMA path and gdrcopy is used.

 

On Fri, Sep 6, 2019 at 7:36 AM Arturo Fernandez via users 
<users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > wrote:

Josh, 

Thank you. Yes, I built UCX with CUDA and gdrcopy support. I also had to 
disable numa (--disable-numa) as requested during the installation. 

AFernandez 

 

Joshua Ladd wrote 

Did you build UCX with CUDA support (--with-cuda) ? 

 

Josh 

 

On Thu, Sep 5, 2019 at 8:45 PM AFernandez via users < users@lists.open-mpi.org  
<mailto:users@lists.open-mpi.org> > wrote: 

Hello OpenMPI Team, 

I'm trying to use CUDA-aware OpenMPI but the system simply ignores the GPU and 
the code runs on the CPUs. I've tried different software but will focus on the 
OSU benchmarks (collective and pt2pt communications). Let me provide some data 
about the configuration of the system: 

-OFED v4.17-1-rc2 (the NIC is virtualized but I also tried a Mellanox card with 
MOFED a few days ago and found the same issue) 

-CUDA v10.1 

-gdrcopy v1.3 

-UCX 1.6.0 

-OpenMPI 4.0.1 

Everything looks like good (CUDA programs work fine, MPI programs run on the 
CPUs without any problem), and the ompi_info outputs what I was expecting (but 
maybe I'm missing something): 

mca:opal:base:param:opal_built_with_cuda_support:synonym:name:mpi_built_with_cuda_support
 

mca:mpi:base:param:mpi_built_with_cuda_support:value:true 

mca:mpi:base:param:mpi_built_with_cuda_support:source:default 

mca:mpi:base:param:mpi_built_with_cuda_support:status:read-only 

mca:mpi:base:param:mpi_built_with_cuda_support:level:4 

mca:mpi:base:param:mpi_built_with_cuda_support:help:Whether CUDA GPU buffer 
support is built into library or not 

mca:mpi:base:param:mpi_built_with_cuda_support:enumerator:value:0:false 

mca:mpi:base:param:mpi_built_with_cuda_support:enumerator:value:1:true 

mca:mpi:base:param:mpi_built_with_cuda_support:deprecated:no 

mca:mpi:base:param:mpi_built_with_cuda_support:type:bool 

mca:mpi:base:param:mpi_built_with_cuda_support:synonym_of:name:opal_built_with_cuda_support
 

mca:mpi:base:param:mpi_built_with_cuda_support:disabled:false 

The available btls are the usual self, openib, tcp & vader plus smcuda, uct & 
usnic. The full output from ompi_info is attached. If I try the flag '--mca 
opal_cuda_verbose 10,' it doesn't output anything, which seems to agree with 
the lack of GPU use. If I try with '--mca btl smcuda,' it makes no difference. 
I have also tried to specify the program to use host and device (e.g. mpirun 
-np 2 ./osu_latency D H) but the same result. I am probably missing something 
but not sure where else to look at or what else to try. 

Thank you, 

AFernandez 

_______________________________________________ 
users mailing list 
users@lists.open-mpi.org  <mailto:users@lists.open-mpi.org> 
https://lists.open-mpi.org/mailman/listinfo/users  
<https://lists.open-mpi.org/mailman/listinfo/users> 

_______________________________________________
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> 
https://lists.open-mpi.org/mailman/listinfo/users




 

-- 

-Akshay

NVIDIA

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to