Hello OpenMPI users, Is there any document on MPI_Allreduce() implementation? I’m using it to do summation on GPU data. I wonder if OpenMPI will first do summation on processes in the same node, and then do summation on the intermediate results across nodes. This would be preferable since it reduces cross node communication and should be faster?
I’m using OpenMPI 1.10.0 and CUDA 7.0. I need to sum 40 million float numbers on 6 nodes, each node running 4 processes. The nodes are connected via InfiniBand. Thanks very much! Best, Yang ------------------------------------------------------------------------ Sent by Apple Mail Yang ZHANG PhD candidate Networking and Wide-Area Systems Group Computer Science Department New York University 715 Broadway Room 705 New York, NY 10003