George Thanks for the references. However, I was not able to figure out if it what I am asking is so trivial it is simply passed over or so subtle that its been overlooked (I suspect the former). The binary tree algorithm in MPI_Allreduce takes a tiume proportional to 2*N*log_2M where N is the vector length and M is the number of processes. There is a divide and conquer strategy (http://www.hlrs.de/organization/par/services/models/mpi/myreduce.html) that mpich uses to do a MPI_Reduce in a time proportional to N. Is this algorithm or something equivalent in OpenMPI at present? If so how do I turn it on?
I also found that OpenMPI is sometimes very slow on MPI_Allreduce using TCP. Things are OK up to 16 processes but at 24 the rates (Message length divided by time) are as follows: Message size (Kbytes) Throughput (Mbytes/sec) M=24 M=32 M=48 1 1.38 1.30 1.09 2 2.28 1.94 1.50 4 2.92 2.35 1.73 8 3.56 2.81 1.99 16 3.97 1.94 0.12 32 0.34 0.24 0.13 64 3.07 2.33 1.57 128 3.70 2.80 1.89 256 4.10 3.10 2.08 512 4.19 3.28 2.08 1024 4.36 3.36 2.17 Around 16-32KBytes there is a pronouced slowdown-roughly a factor of 10, which seems too much. Any idea whats going on? Tony ------------------------------- Tony Ladd Chemical Engineering University of Florida PO Box 116005 Gainesville, FL 32611-6005 Tel: 352-392-6509 FAX: 352-392-9513 Email: tl...@che.ufl.edu Web: http://ladd.che.ufl.edu