>-----Original Message-----
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime
>Boissonneault
>Sent: Tuesday, May 27, 2014 4:07 PM
>To: Open MPI Users
>Subject: Re: [OMPI users] Advices for parameter tuning for CUDA-aware MPI
>
>Answers inline too.
>>> 2) Is the absence of btl_openib_have_driver_gdr an indicator of
>>> something missing ?
>> Yes, that means that somehow the GPU Direct RDMA is not installed
>correctly. All that check does is make sure that the file
>/sys/kernel/mm/memory_peers/nv_mem/version exists.  Does that exist?
>>
>It does not. There is no
>/sys/kernel/mm/memory_peers/
>
>>> 3) Are the default parameters, especially the rdma limits and such,
>>> optimal for our configuration ?
>> That is hard to say.  GPU Direct RDMA does not work well when the GPU
>and IB card are not "close" on the system. Can you run "nvidia-smi topo -m"
>on your system?
>nvidia-smi topo -m
>gives me the error
>[mboisson@login-gpu01 ~]$ nvidia-smi topo -m Invalid combination of input
>arguments. Please run 'nvidia-smi -h' for help.
Sorry, my mistake.  That may be a future feature.

>
>I could not find anything related to topology in the help. However, I can tell
>you the following which I believe to be true
>- GPU0 and GPU1 are on PCIe bus 0, socket 0
>- GPU2 and GPU3 are on PCIe bus 1, socket 0
>- GPU4 and GPU5 are on PCIe bus 2, socket 1
>- GPU6 and GPU7 are on PCIe bus 3, socket 1
>
>There is one IB card which I believe is on socket 0.
>
>
>I know that we do not have the Mellanox Ofed. We use the Linux RDMA from
>CentOS 6.5. However, should that completely disable GDR within a single
>node ? i.e. does GDR _have_ to go through IB ? I would assume that our lack
>of Mellanox OFED would result in no-GDR inter-node, but GDR intra-node.

Without Mellanox OFED, then GPU Direct RDMA is unavailable.  However, the term 
GPU Direct is a somewhat overloaded term and I think that is where I was 
getting confused.  GPU Direct (also known as CUDA IPC) will work between GPUs 
that do not cross a QPI connection.  That means that I believe GPU0,1,2,3 
should be able to use GPU Direct between them and GPU4,5,6,7 can also between 
them.   In this case, this means that GPU memory does not need to get staged 
through host memory for transferring between the GPUs.  With Open MPI, there is 
a mca parameter you can set that will allow you to see whether GPU Direct is 
being used between the GPUs.

--mca btl_smcuda_cuda_ipc_verbose 100

 Rolf

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Reply via email to