Tony, What do mean by TCP ? Are you using an ethernet interconnect ? I have noticed a similar slowdown using LAM/MPI and MPI_Alltoall primitive on our Solaris 10 cluster using gigabit ethernet and TCP. For a large number of nodes I could ever come to a complete hangup. Part of the problem lied in the ethernet network itself, and setting hardware flow control on ethernet interfaces and switch lead to considerable improvement. With flow control, I could approach the full duplex bandwidth (200 MB/s) for large buffer sizes. I could achieve additional improvement by using optimized algorithms (thanks to George and others on this point), especially for smaller buffer sizes in the same range as yours. I did not studied the MPI_Reduce case, but I suspect it would be similar. If this is relevant to you, you may find this discussion hanging somewhere, most probably on the LAM/MPI list, starting august or september 2005. I did not experimented Open MPI at that time due to portability problems on Solaris 10 opteron platforms. Now these problems have been solved, and Open MPI is generally faster on our applications that LAM/MPI and MPICH. Pierre. George Bosilca wrote: On Oct 28, 2006, at 6:51 PM, Tony Ladd wrote:GeorgeThanks for the references. However, I was not able to figure out if it what I am asking is so trivial it is simply passed over or so subtle that its been overlooked (I suspect the former).No. The answer to your question was in the articles. We have more than just the Rabenseifner reduce and all-reduce algorithms. Some of the most common collective communication calls have up to 15 different implementations in Open MPI. Of course, each of these implementations give the best performance under some particular conditions. Unfortunately, there is no unique algorithms that give the best performance in all cases. As we have to deal with multiple algorithms for each collective, we have to figure out which one is better and where. This usually depend on the number of nodes in the communicator, the message size as well as the network properties. In few words, it's difficult to choose the best one without having prior knowledge about the networks you're trying to use. This is something we're working on right now on Open MPI. Until then ... It might happens that for some particular points the performance of he collective communications will not show the best possible performance. However, to have a slow-down of a factor of 10 is quite unbelievable. There might be something else going on there... Thanks, george. PS: BTW which version of Open MPI are you using ? The one who deliver the best performance or the collective communications (at least on high performance networks) is the nightly release of he 1.2 branch.The binary tree algorithm in MPI_Allreduce takes a tiume proportional to 2*N*log_2M where N is the vector length and M is the number of processes. There is a divide and conquer strategy (http://www.hlrs.de/organization/par/services/models/mpi/ myreduce.html) that mpich uses to do a MPI_Reduce in a time proportional to N. Is this algorithm or something equivalent in OpenMPI at present? If so how do I turn it on? I also found that OpenMPI is sometimes very slow on MPI_Allreduce using TCP. Things are OK up to 16 processes but at 24 the rates (Message length divided by time) are as follows: Message size (Kbytes) Throughput (Mbytes/sec) M=24 M=32 M=48 1 1.38 1.30 1.09 2 2.28 1.94 1.50 4 2.92 2.35 1.73 8 3.56 2.81 1.99 16 3.97 1.94 0.12 32 0.34 0.24 0.13 64 3.07 2.33 1.57 128 3.70 2.80 1.89 256 4.10 3.10 2.08 512 4.19 3.28 2.08 1024 4.36 3.36 2.17 Around 16-32KBytes there is a pronouced slowdown-roughly a factor of 10, which seems too much. Any idea whats going on? Tony ------------------------------- Tony Ladd Chemical Engineering University of Florida PO Box 116005 Gainesville, FL 32611-6005 Tel: 352-392-6509 FAX: 352-392-9513 Email: tl...@che.ufl.edu Web: http://ladd.che.ufl.edu _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Soutenez le mouvement SAUVONS LA RECHERCHE : http://recherche-en-danger.apinc.org/ _/_/_/_/ _/ _/ Dr. Pierre VALIRON _/ _/ _/ _/ Laboratoire d'Astrophysique _/ _/ _/ _/ Observatoire de Grenoble / UJF _/_/_/_/ _/ _/ BP 53 F-38041 Grenoble Cedex 9 (France) _/ _/ _/ http://www-laog.obs.ujf-grenoble.fr/~valiron/ _/ _/ _/ Mail: pierre.vali...@obs.ujf-grenoble.fr _/ _/ _/ Phone: +33 4 7651 4787 Fax: +33 4 7644 8821 _/ _/_/ |
- [OMPI users] OMPI collectives Tony Ladd
- Re: [OMPI users] OMPI collectives George Bosilca
- [OMPI users] OMPI Collectives Tony Ladd
- Re: [OMPI users] OMPI Collectives George Bosilca
- [OMPI users] OMPI Collectives Tony Ladd
- Re: [OMPI users] OMPI Collectives George Bosilca
- Re: [OMPI users] OMPI Collectives Michael Kluskens
- Re: [OMPI users] OMPI Collectives Jeff Squyres
- Re: [OMPI users] OMPI Collectives Pierre Valiron
- [OMPI users] OMPI collectives Tony Ladd
- Re: [OMPI users] OMPI collectives George Bosilca