Hello OpenMPI users,

Is there any document on MPI_Allreduce() implementation? I’m using
it to do summation on GPU data. I wonder if OpenMPI will first do
summation on processes in the same node, and then do summation on the
intermediate results across nodes. This would be preferable since it
reduces cross node communication and should be faster?

I’m using OpenMPI 1.10.0 and CUDA 7.0. I need to sum 40 million float
numbers on 6 nodes, each node running 4 processes. The nodes are
connected via InfiniBand.

Thanks very much!

Best,
Yang

------------------------------------------------------------------------

Sent by Apple Mail

Yang ZHANG

PhD candidate

Networking and Wide-Area Systems Group
Computer Science Department
New York University

715 Broadway Room 705
New York, NY 10003

Reply via email to