Re: [OMPI users] CUDA-aware MPI_Reduce problem in Openmpi 1.8.5

Rolf vandeVaart Wed, 17 Jun 2015 15:09:02 -0400 (EDT)

There is no short-term plan but we are always looking at ways to improve things 
so this could be looked at some time in the future.

Rolf

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Fei Mao
Sent: Wednesday, June 17, 2015 1:48 PM
To: Open MPI Users
Subject: Re: [OMPI users] CUDA-aware MPI_Reduce problem in Openmpi 1.8.5

Hi Rolf,

Thank you very much for clarifying the problem. Is there any plan to support 
GPU RDMA for reduction in the future?

On Jun 17, 2015, at 1:38 PM, Rolf vandeVaart 
<rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote:

Hi Fei:

The reduction support for CUDA-aware in Open MPI is rather simple.  The GPU 
buffers are copied into temporary host buffers and then the reduction is done 
with the host buffers.  At the completion of the host reduction, the data is 
copied back into the GPU buffers.  So, there is no use of CUDA IPC or GPU 
Direct RDMA in the reduction.

Rolf

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Fei Mao
Sent: Wednesday, June 17, 2015 1:08 PM
To: us...@open-mpi.org<mailto:us...@open-mpi.org>
Subject: [OMPI users] CUDA-aware MPI_Reduce problem in Openmpi 1.8.5

Hi there,

I am doing benchmarks on a GPU cluster with two CPU sockets and 4 K80 GPUs each 
node. Two K80 are connected with CPU socket 0, another two with socket 1. An IB 
ConnectX-3 (FDR) is also under socket 1. We are using Linux's OFED, so I know 
there is no way to do GPU RDMA inter-node communication. I can do intra-node 
IPC for MPI_Send and MPI_Receive with two K80 (4 GPUs in total) which are 
connected under same socket (PCI-e switch). So I thought I could do intra-node 
MPI_Reduce with IPC support in openmpi 1.8.5.

The benchmark I was using is osu-micro-benchmarks-4.4.1, and I got the same 
results when I use two GPU under the same socket or different socket. The 
result was the same even I used two GPUs in different nodes.

Does MPI_Reduce use IPC for intra-node? Should I have to install Mellanox OFED 
stack to support GPU RDMA reduction on GPUs even they are under with the same 
PCI-e switch?

Thanks,

Fei Mao
High Performance Computing Technical Consultant
SHARCNET | http://www.sharcnet.ca<http://www.sharcnet.ca/>
Compute/Calcul Canada | 
http://www.computecanada.ca<http://www.computecanada.ca/>
________________________________
This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.
________________________________
_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/06/27147.php

Re: [OMPI users] CUDA-aware MPI_Reduce problem in Openmpi 1.8.5

Reply via email to