-blocking reduction algorithm and its parameters?
Something like the ones we have for the blocking versions, for instance:
"coll_tuned_allreduce_algorithm", "coll_tuned_reduce_algorithm", etc.
--
Felipe
2015-11-27 18:20 GMT-02:00 Ralph Castain :
> One thing you might want
=
[RESULT] Reduce time = 17.587828
[RESULT] Total time = 20.655875
==
Intel MPI + non-blocking:
==
[RESULT] Reduce time = 49.483195
[RESULT] Total time = 52.642514
==
Thanks in advance.
2015-11-27 14:57 GMT-02:00 Fe
he non-blocking was about five times slower. I tried Intel's MPI and it
was of 3 times, instead of 5.
Question 1: Do you think that all this overhead makes sense?
Question 2: Why is there so much overhead for non-blocking collective calls?
Question 3: Can I change the algorithm for the non-blocking allReduce to
improve this?
Best regards,
--
Felipe