In terms of theoretical complexity, the best Allreduce algorithm I'm aware about allow you to decrease the complexity to O(log(size)) + O(count), where size is the size of the message and count is the number of participants in the communicator. You can find a reference to this algorithm here (http://www.hlrs.de/organization/par/services/models/mpi/myreduce.html ).
Even the Alltoall can be optimized, but unfortunately not as much as the other collective. This collective is considered as the most difficult to implement and to optimize, even on homogeneous environments. Another interesting paper on alltoall is http://ieeexplore.ieee.org/Xplore/login.jsp?url=/iel4/71/13981/00642949.pdf ?arnumber=642949.
Is there anything more complex than a alltoall ... Well you can take a look in the same family of collectives communication patterns to alltoall[vwx].
george. On Mar 13, 2008, at 9:29 AM, Brock Palen wrote:
Yeah, I know what you mean about if you have to use a 'all to all' use MPI_Alltoall() don't roll your own. So on paper, alltoall at first glance appears to be: n*(n-1) -> n^2- n -> n^2 (for large n). Allreduce appears to be simplest, n point to points followed by a bcast(). Which can be simplified to gather + bcast. Last I knew MPI_Bcast() was log(n) and gather is (n). So for allreduce I get: n+log(n) I guess I am confused how to get alltoall() down from n^2. Thanks. Brock Palen Center for Advanced Computing bro...@umich.edu (734)936-1985 On Mar 12, 2008, at 6:05 PM, Aurélien Bouteiller wrote:If you can avoid them it is better to avoid them. However it is alwaysbetter to use a MPI_Alltoall than coding your own all to all withpoint to point, and in some algorithms you *need* to make a all to all communication. What you should understand by "avoid all to all" is notavoid MPI_alltoall, but choose a mathematic algorithm that does not need all to all. The algorithmic complexity of AllReduce is the same as AlltoAll. Aurelien Le 12 mars 08 à 17:01, Brock Palen a écrit :I have always been told that calls like MPI_Barrior() MPI_Allreduce()and MPI_Alltoall() should be avoided. I understand MPI_Alltoall() as it goes n*(n-1) sends and thus grows very very quickly. MPI_Barrior() is very latency sensitive and generally is not needed in most cases I have seen it used. But why MPI_Allreduce()? What other functions should generally be avoided? Sorry this is kinda off topic for the list :-) Brock Palen Center for Advanced Computing bro...@umich.edu (734)936-1985 _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
smime.p7s
Description: S/MIME cryptographic signature