George
Thanks for the references. However, I was not able to figure out if it what
I am asking is so trivial it is simply passed over or so subtle that its
been overlooked (I suspect the former). The binary tree algorithm in
MPI_Allreduce takes a tiume proportional to 2*N*log_2M where N is the vector
length and M is the number of processes. There is a divide and conquer
strategy
(http://www.hlrs.de/organization/par/services/models/mpi/myreduce.html) that
mpich uses to do a MPI_Reduce in a time proportional to N. Is this algorithm
or something equivalent in OpenMPI at present? If so how do I turn it on?
I also found that OpenMPI is sometimes very slow on MPI_Allreduce using TCP.
Things are OK up to 16 processes but at 24 the rates (Message length divided
by time) are as follows:
Message size (Kbytes) Throughput (Mbytes/sec)
M=24 M=32 M=48
1 1.38 1.30 1.09
2 2.28 1.94 1.50
4 2.92 2.35 1.73
8 3.56 2.81 1.99
16 3.97 1.94 0.12
32 0.34 0.24 0.13
64 3.07 2.33 1.57
128 3.70 2.80 1.89
256 4.10 3.10 2.08
512 4.19 3.28 2.08
1024 4.36 3.36 2.17
Around 16-32KBytes there is a pronouced slowdown-roughly a factor of 10,
which seems too much. Any idea whats going on?
Tony
-------------------------------
Tony Ladd
Chemical Engineering
University of Florida
PO Box 116005
Gainesville, FL 32611-6005
Tel: 352-392-6509
FAX: 352-392-9513
Email: [email protected]
Web: http://ladd.che.ufl.edu