"Iliev, Hristo" <il...@rz.rwth-aachen.de> writes: > Hi Dave, > > Is it MPI_ALLTOALL or MPI_ALLTOALLV that runs slower?
Well, the output says MPI_ALLTOALL, but this prompted me to check, and it turns out that it's lumping both together. > If it is the latter, > the reason could be that the default implementation of MPI_ALLTOALLV in > 1.6.5 is different from that in 1.5.4. To switch back to the previous one, > use: > > --mca coll_tuned_use_dynamic_rules 1 --mca coll_tuned_alltoallv_algorithm 1 Yes, that does it. Can someone comment generally on the situations in which the new default wins? I suspect where I'm seeing it lose (on dual-socket sandybridge, QDR IB) is representative of a lot of chemistry code which tends to be a/the major consumer of academic HPC cycles. If so, this probably merits an FAQ entry. > The logic that selects the MPI_ALLTOALL implementation is the same in both > versions, although the pairwise implementation in 1.6.5 is a bit different. > The difference should have negligible effects though. > > Note that coll_tuned_use_dynamic_rules has to be enabled in order for MCA > parameters that allows you to select the algorithms to be registered. Ah, thanks. This now seems familiar, but still obscure. > Therefore you have use ompi_info as follows: > > ompi_info --mca coll_tuned_use_dynamic_rules 1 --param coll tuned > > Hope that helps! Ja, danke!