"Iliev, Hristo" <il...@rz.rwth-aachen.de> writes:

> Hi Dave,
>
> Is it MPI_ALLTOALL or MPI_ALLTOALLV that runs slower?

Well, the output says MPI_ALLTOALL, but this prompted me to check, and
it turns out that it's lumping both together.

> If it is the latter,
> the reason could be that the default implementation of MPI_ALLTOALLV in
> 1.6.5 is different from that in 1.5.4. To switch back to the previous one,
> use:
>
> --mca coll_tuned_use_dynamic_rules 1 --mca coll_tuned_alltoallv_algorithm 1

Yes, that does it.

Can someone comment generally on the situations in which the new default
wins?

I suspect where I'm seeing it lose (on dual-socket sandybridge, QDR IB)
is representative of a lot of chemistry code which tends to be a/the
major consumer of academic HPC cycles.  If so, this probably merits an
FAQ entry.

> The logic that selects the MPI_ALLTOALL implementation is the same in both
> versions, although the pairwise implementation in 1.6.5 is a bit different.
> The difference should have negligible effects though.
>
> Note that coll_tuned_use_dynamic_rules has to be enabled in order for MCA
> parameters that allows you to select the algorithms to be registered.

Ah, thanks.  This now seems familiar, but still obscure.

> Therefore you have use ompi_info as follows:
>
> ompi_info --mca coll_tuned_use_dynamic_rules 1 --param coll tuned
>
> Hope that helps!

Ja, danke!

Reply via email to