I understand MPI_Alltoall() as it goes n*(n-1) sends and thus grows very very quickly. MPI_Barrior() is very latency sensitive and generally is not needed in most cases I have seen it used.
But why MPI_Allreduce()? What other functions should generally be avoided? Sorry this is kinda off topic for the list :-) Brock Palen Center for Advanced Computing bro...@umich.edu (734)936-1985